johnswentworth's Shortform

johnswentworth

johnswentworth's Shortform

post by johnswentworth · 2020-02-27T19:04:55.108Z · LW · GW · 488 comments

492 comments

488 comments

Comments sorted by top scores.

comment by johnswentworth · 2025-01-10T15:08:38.933Z · LW(p) · GW(p)

I think a very common problem in alignment research today is that people focus almost exclusively on a specific story about strategic deception/scheming, and that story is a very narrow slice of the AI extinction probability mass. At some point I should probably write a proper post on this, but for now here are few off-the-cuff example AI extinction stories which don't look like the prototypical scheming story. (These are copied from a Facebook thread.)

Perhaps the path to superintelligence looks like applying lots of search/optimization over shallow heuristics. Then we potentially die to things which aren't smart enough to be intentionally deceptive, but nonetheless have been selected-upon to have a lot of deceptive behaviors (via e.g. lots of RL on human feedback).
The "Getting What We Measure" scenario from Paul's old "What Failure Looks Like [LW · GW]" post.
The "fusion power generator scenario [LW · GW]".
Perhaps someone trains a STEM-AGI, which can't think about humans much at all. In the course of its work, that AGI reasons that an oxygen-rich atmosphere is very inconvenient for manufacturing, and aims to get rid of it. It doesn't think about humans at all, but the human operators can't understand most of the AI's plans anyway, so the plan goes through. As an added bonus, nobody can figure out why the atmosphere is losing oxygen until it's far too late, because the world is complicated and becomes more so with a bunch of AIs running around and no one AI has a big-picture understanding of anything either (much like today's humans have no big-picture understanding of the whole human economy/society).
People try to do the whole "outsource alignment research to early AGI" thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they're already on the more-powerful next gen, so it's too late.
The classic overnight hard takeoff: a system becomes capable of self-improving at all but doesn't seem very alarmingly good at it, somebody leaves it running overnight, exponentials kick in, and there is no morning.
(At least some) AGIs act much like a colonizing civilization. Plenty of humans ally with it, trade with it, try to get it to fight their outgroup, etc, and the AGIs locally respect the agreements with the humans and cooperate with their allies, but the end result is humanity gradually losing all control and eventually dying out.
Perhaps early AGI involves lots of moderately-intelligent subagents. The AI as a whole mostly seems pretty aligned most of the time, but at some point a particular subagent starts self-improving, goes supercritical, and takes over the rest of the system overnight. (Think cancer, but more agentic.)
Perhaps the path to superintelligence looks like scaling up o1-style runtime reasoning to the point where we're using an LLM to simulate a whole society. But the effects of a whole society (or parts of a society) on the world are relatively decoupled from the things-individual-people-say-taken-at-face-value. For instance, lots of people talk a lot about reducing poverty, yet have basically-no effect on poverty. So developers attempt to rely on chain-of-thought transparency, and shoot themselves in the foot.

Replies from: johnswentworth, Buck, avturchin, johannes-c-mayer, karl-krueger, Simon Skade, lunatic_at_large, ozziegooen, Kajus

↑ comment by johnswentworth · 2025-01-10T15:09:32.292Z · LW(p) · GW(p)

Also (separate comment because I expect this one to be more divisive): I think the scheming story has been disproportionately memetically successful largely because it's relatively easy to imagine hacky ways of preventing an AI from intentionally scheming. And that's mostly a bad thing; it's a form of streetlighting.

Replies from: Buck, nathan-helm-burger

↑ comment by Buck · 2025-01-10T18:49:27.609Z · LW(p) · GW(p)

Most of the problems you discussed here more easily permit hacky solutions than scheming does.

Replies from: Zvi

↑ comment by Zvi · 2025-01-13T20:06:50.995Z · LW(p) · GW(p)

Individually for a particular manifestation of each issue this is true, you can imagine doing a hacky solution to each one. But that assumes there is a list of such particular problems that if you check off all the boxes you win, rather than them being manifestations of broader problems. You do not want to get into a hacking contest if you're not confident your list is complete.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-01-13T20:17:51.019Z · LW(p) · GW(p)

True, but Buck's claim is still relevant as a counterargument to my claim about memetic fitness of the scheming story relative to all these other stories.

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-10T19:59:04.211Z · LW(p) · GW(p)

This is an interesting point. I disagree that scheming vs these ideas you mention is much of a 'streetlighting' case. I do, however, have my own fears that 'streetlighting' is occurring and causing some hard-but-critical avenues of risk to be relatively neglected.

[Edit: on further thought, I think this might not just be a "streetlighting"effect, but also a "keeping my hands clean" effect. I think it's more tempting, especially for companies, to focus on harms that could plausibly be construed as being their fault. It's my impression that, for instance, employees of a given company might spend a disproportionate amount of time thinking about how to keep their company's product from harming people vs the general class of products from harming people. Also, less inclined to think about harm which could be averted via application of their product. This is additional reason for concern that having the bulk of AI safety work being funded by / done in AI companies will lead to correlated oversights.]

My concerns that I think are relatively neglected in AI safety discourse are mostly related to interactions with incompetent or evil humans. Good alignment and control techniques don't do any good if someone opts not to use them in some critical juncture.

Some potential scenarios:

If AI is very powerful, and held in check tenuously by fragile control systems, it might be released from control by a single misguided human or some unlucky chain of events, and then go rogue.
If algorithmic progress goes surprisingly quickly, we might find ourselves in a regime where a catastrophically dangerous AI can be assembled from some mix of pre-existing open-weights models, plus fine-tuning, plus new models trained with new algorithms, and probably all stitched together with hacky agent frameworks. Then all it would take would be for sufficient hints about this algorithmic discovery to leak, and someone in the world to reverse-engineer it, and then there would be potent rogue AI all over the internet all of a sudden.
If the AI is purely intent-aligned, a bad human might use it to pursue broad coercive power.
Narrow technical AI might unlock increasingly powerful and highly offense-dominant technology with lower and lower activation costs (easy to build and launch with common materials). Even if the AI itself never got out of hand, if the dangerous tech secrets got leaked (or controlled by an aggressive government) then things could go very poorly for the world.

↑ comment by Buck · 2025-01-10T18:48:47.410Z · LW(p) · GW(p)

IMO the main argument for focusing on scheming risk is that scheming is the main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful (as I discuss here [AF · GW]). These other problems all seem like they require the models to be way smarter in order for them to be a big problem. Though as I said here [LW(p) · GW(p)], I'm excited for work on some non-scheming misalignment risks.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-01-10T19:27:22.858Z · LW(p) · GW(p)

scheming is the main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful...

Seems quite wrong. The main plausible source of catastrophic risk from the first AIs that either pose substantial misalignment risk or that are extremely useful is that they cause more powerful AIs to be built which will eventually be catastrophic, but which have problems that are not easily iterable-upon (either because problems are hidden, or things move quickly, or ...).

And causing more powerful AIs to be built which will eventually be catastrophic is not something which requires a great deal of intelligent planning; humanity is already racing in that direction on its own, and it would take a great deal of intelligent planning to avert it. This story, for example:

People try to do the whole "outsource alignment research to early AGI" thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they're already on the more-powerful next gen, so it's too late.

This story sounds clearly extremely plausible (do you disagree with that?), involves exactly the sort of AI you're talking about ("the first AIs that either pose substantial misalignment risk or that are extremely useful"), but the catastropic risk does not come from that AI scheming. It comes from people being dumb by default, the AI making them think it's ok (without particularly strategizing to do so), and then people barreling ahead until it's too late.

These other problems all seem like they require the models to be way smarter in order for them to be a big problem.

Also seems false? Some of the relevant stories:

As mentioned above, the "outsource alignment to AGI" failure-story was about exactly the level of AI you're talking about.
In worlds where hard takeoff naturally occurs, it naturally occurs when AI is just past human level in general capabilities (and in particular AI R&D), which I expect is also roughly the same level you're talking about (do you disagree with that?).
The story about an o1-style AI does not involve far possibilities and would very plausibly kick in at-or-before the first AIs that either pose substantial misalignment risk or that are extremely useful.

A few of the other stories also seem debatable depending on trajectory of different capabilities, but at the very least those three seem clearly potentially relevant even for the first highly dangerous or useful AIs.

Replies from: Buck

↑ comment by Buck · 2025-01-10T19:32:26.083Z · LW(p) · GW(p)

People try to do the whole "outsource alignment research to early AGI" thing, but the human overseers are themselves sufficiently incompetent at alignment of superintelligences that the early AGI produces a plan which looks great to the overseers (as it was trained to do), and that plan totally fails to align more-powerful next-gen AGI at all. And at that point, they're already on the more-powerful next gen, so it's too late.
This story sounds clearly extremely plausible (do you disagree with that?), involves exactly the sort of AI you're talking about ("the first AIs that either pose substantial misalignment risk or that are extremely useful"), but the catastropic risk does not come from that AI scheming.

This problem seems important (e.g. it's my last bullet here [LW(p) · GW(p)]). It seems to me much easier to handle, because if this problem is present, we ought to be able to detect its presence by using AIs to do research on other subjects that we already know a lot about (e.g. the string theory analogy here [LW · GW]). Scheming is the only reason why the model would try to make it hard for us to notice that this problem is present.

Replies from: johnswentworth, Charlie Steiner

↑ comment by johnswentworth · 2025-01-10T21:58:09.871Z · LW(p) · GW(p)

A few problems with this frame.

First: you're making reasonably-pessimistic assumptions about the AI, but very optimistic assumptions about the humans/organization. Sure, someone could look for the problem by using AIs to do research on other subject that we already know a lot about. But that's a very expensive and complicated project - a whole field, and all the subtle hints about it, need to be removed from the training data, and then a whole new model trained! I doubt that a major lab is going to seriously take steps much cheaper and easier than that, let alone something that complicated.

One could reasonably respond "well, at least we've factored apart the hard technical bottleneck from the part which can be solved by smart human users or good org structure". Which is reasonable to some extent, but also... if a product requires a user to get 100 complicated and confusing steps all correct in order for the product to work, then that's usually best thought of as a product design problem, not a user problem. Making the plan at least somewhat robust to people behaving realistically less-than-perfectly is itself part of the problem.

Second: looking for the problem by testing on other fields itself has subtle failure modes, i.e. various ways to Not Measure What You Think You Are Measuring [LW · GW]. A couple off-the-cuff examples:

A lab attempting this strategy brings in some string theory experts to evaluate their attempts to rederive string theory with AI assistance. But maybe (as I've heard claimed many times) string theory is itself an empty echo-chamber, and some form of sycophancy or telling people what they want to hear is the only way this AI-assisted attempt gets a good evaluation from the string theorists.
It turns out that fields-we-don't-understand mostly form a natural category distinct from fields-we-do-understand, or that we don't understand alignment precisely because our existing tools which generalize across many other fields don't work so well on alignment. Either of those would be a (not-improbable-on-priors) specific reason to expect that our experience attempting to rederive some other field does not generalize well to alignment.

And to be clear, I don't think of these as nitpicks, or as things which could go wrong separately from all the things originally listed. They're just the same central kinds of failure modes showing up again, and I expect them to generalize to other hacky attempts to tackle the problem.

Third: it doesn't really matter whether the model is trying to make it hard for us to notice the problem. What matters is (a) how likely we are to notice the problem "by default", and (b) whether the AI makes us more or less likely to notice the problem, regardless of whether it's trying to do so. The first story at top-of-thread is a good central example here:

Perhaps the path to superintelligence looks like applying lots of search/optimization over shallow heuristics. Then we potentially die to things which aren't smart enough to be intentionally deceptive, but nonetheless have been selected-upon to have a lot of deceptive behaviors (via e.g. lots of RL on human feedback).

Generalizing that story to attempts to outsource alignment work to earlier AI: perhaps the path to moderately-capable intelligence looks like applying lots of search/optimization over shallow heuristics. If the selection pressure is sufficient, that system may well learn to e.g. be sycophantic in exactly the situations where it won't be caught... though it would be "learning" a bunch of shallow heuristics with that de-facto behavior, rather than intentionally "trying" to be sycophantic in exactly those situations. Then the sycophantic-on-hard-to-verify-domains AI tells the developers that of course their favorite ideas for aligning the next generation of AI will work great, and it all goes downhill from there.

Replies from: chess-ice

↑ comment by Dakara (chess-ice) · 2025-01-11T11:38:51.650Z · LW(p) · GW(p)

All 3 points seem very reasonable, looking forward to Buck's [LW · GW] response to them.

Replies from: chess-ice

↑ comment by Dakara (chess-ice) · 2025-01-13T16:12:33.279Z · LW(p) · GW(p)

Additionally, I am curious to hear if Ryan's [LW · GW] views on the topic are similar to Buck's [LW · GW], given that they work at the same organization.

↑ comment by Charlie Steiner · 2025-01-11T23:36:36.406Z · LW(p) · GW(p)

One big reason I might expect an AI to do a bad job at alignment research is if it doesn't do a good job (according to humans) of resolving cases where humans are inconsistent or disagree. How do you detect this in string theory research? Part of the reason we know so much about physics is humans aren't that inconsistent about it and don't disagree that much. And if you go to sub-topics where humans do disagree, how do you judge its performance (because 'be very convincing to your operators' is an objective with a different kind of danger).

Another potential red flag is if the AI gives humans what they ask for even when that's 'dumb' according to some sophisticated understanding of human values. This could definitely show up in string theory research (note when some ideas suggest non-string-theory paradigms might be better, and push back on the humans if the humans try to ignore this), it's just intellectually difficult (maybe easier in loop quantum gravity research heyo gottem) and not as salient without the context of alignment and human values.

↑ comment by avturchin · 2025-01-10T19:53:10.340Z · LW(p) · GW(p)

I once counted several dozens of the ways how AI can cause human extinction, may be some ideas may help (map, text).

Replies from: Will Aldred

↑ comment by _will_ (Will Aldred) · 2025-01-11T10:44:15.787Z · LW(p) · GW(p)

See also ‘The Main Sources of AI Risk? [LW · GW]’ by Wei Dai and Daniel Kokotajlo, which puts forward 35 routes to catastrophe (most of which are disjunctive). (Note that many of the routes involve something other than intent alignment going wrong.)

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2025-01-11T13:07:47.132Z · LW(p) · GW(p)

Another one: We manage to solve alignment to a significant extend. The AI who is much smarter than a human thinks that it is aligned, and takes aligned actions. The AI even predicts that it will never become unaligned to humans. However, at some point in the future as the AI naturally unrolles into a reflectively stable equilibrium it becomes unaligned.

↑ comment by Karl Krueger (karl-krueger) · 2025-01-11T03:18:09.683Z · LW(p) · GW(p)

I see a lot of discussion of AI doom stemming from research, business, and government / politics (including terrorism). Not a lot about AI doom from crime. Criminals don't stay in the box; the whole point of crime is to benefit yourself by breaking the rules and harming others. Intentional creation of intelligent cybercrime tools — ecosystems of AI malware, exploit discovery, spearphishing, ransomware, account takeovers, etc. — seems like a path to uncontrolled evolution of explicitly hostile AGI, where a maxim of "discover the rules; break them; profit" is designed-in.

↑ comment by Towards_Keeperhood (Simon Skade) · 2025-01-12T16:34:01.159Z · LW(p) · GW(p)

Agree on that people focus a bit too much on scheming. It might be good for some people to think a bit more about the other failure modes you described, but the main thing that needs doing is very smart people making progress towards building an aligned AI, not defending against particular failure modes. (However, most people probably cannot usefully contribute to that, so maybe focusing on failure modes is still good for most people. Only that in any case there's the problem that people will find proposals that very likely don't actually work but which people can rather believe in that they work, and thereby making an AI stop a bit less likely.)

↑ comment by lunatic_at_large · 2025-01-10T15:25:19.846Z · LW(p) · GW(p)

My initial reaction is that at least some of these points would be covered by the Guaranteed Safe AI agenda if that works out, right? Though the "AGIs act much like a colonizing civilization" situation does scare me because it's the kind of thing which locally looks harmless but collectively is highly dangerous. It would require no misalignment on the part of any individual AI.

↑ comment by ozziegooen · 2025-01-15T00:29:55.479Z · LW(p) · GW(p)

This came from a Facebook thread where I argued that many of the main ways AI was described as failing fall into few categories (John disagreed).

I appreciated this list, but they strike me as fitting into a few clusters.

...I would flag that much of that is unsurprising to me, and I think categorization can be pretty fine.
In order:
1) If an agent is unwittingly deceptive in ways that are clearly catastrophic, and that could be understood by a regular person, I'd probably put that under the "naive" or "idiot savant" category. As in, it has severe gaps in its abilities that a human or reasonable agent wouldn't. If the issue is that all reasonable agents wouldn't catch the downsides of a certain plan, I'd probably put that under the "we made a pretty good bet given the intelligence that we had" category.
2) I think that "What Failure Looks Like" is less Accident risk, more "Systemic" risk. I'm also just really unsure what to think about this story. It feels to me like it's a situation where actors are just not able to regulate externalities or similar.
3) The "fusion power generator scenario" seems like just a bad analyst to me. A lot of the job of an analyst is to flag important considerations. This seems like a pretty basic ask. For this itself to be the catastrophic part, I think we'd have to be seriously bad at this. ("i.e. Idiot Savant")
4) STEM-AGI -> I'd also put this in the naive or "idiot savant" category.
5) "that plan totally fails to align more-powerful next-gen AGI at all" -> This seems orthogonal to "categorizing the types of unalignment". This describes how incentives would create an unaligned agent, not what the specific alignment problem is. I do think it would be good to have better terminology here, but would probably consider it a bit adjacent to the specific topic of "AI alignment" - more like "AI alignment strategy/policy" or something.
6) "AGIs act much like a colonizing civilization" -> This sounds like either unalignment has already happened, or humans just gave AIs their own power+rights for some reason. I agree that's bad, but it seems like a different issue than what I think of as the alignment problem. More like, "Yea, if unaligned AIs have a lot of power and agency and different goals, that would be suboptimal"
7) "but at some point a particular subagent starts self-improving, goes supercritical, and takes over the rest of the system overnight." -> This sounds like a traditional mesa-agent failure. I expect a lot of "alignment" with a system made of a bunch of subcomponents is "making sure no subcomponents do anything terrible." Also, still leaves open the specific way this subsystem becomes/is unaligned.
8 ) "using an LLM to simulate a whole society. " -> Sorry, I don't quite follow this one.

Personally, I like the focus "scheming" has. At the same time, I imagine there are another 5 to 20 clean concerns we should also focus on (some of which have been getting attention).

While I realize there's a lot we can't predict, I think we could do a much better just making lists of different risk factors and allocating research amongst them.

↑ comment by Kajus · 2025-01-21T22:10:20.972Z · LW(p) · GW(p)

Some of the stories assume a lot of AIs, wouldn't a lot of human-level AIs be very good at creating a better AI? Also it seems implausible to me that we will get a STEM-AGI that doesn't think about humans much but is powerful enought to get rid of atmosphere. On a different note, evaluating plausability of scenarios is a whole different thing that basically very few people do and write about in AI safety.

Replies from: weibac

↑ comment by Milan W (weibac) · 2025-01-25T06:21:33.645Z · LW(p) · GW(p)

Some of the stories assume a lot of AIs, wouldn't a lot of human-level AIs be very good at creating a better AI?

That is a pretty reasonable assumption. AFAIK that is what the labs plan to do.

Replies from: Kajus

↑ comment by Kajus · 2025-02-02T17:11:50.952Z · LW(p) · GW(p)

What I think is that there won't be a time longer than 5 years where we have a lot of AIs and no super human AI. Basically that the first thing AIs will be used to will be self-improvement and quickly after reasonable ai agents we will get super human AI. Like 6 years.

comment by johnswentworth · 2025-04-14T17:50:04.500Z · LW(p) · GW(p)

... But It's Fake Tho

Epistemic status: I don't fully endorse all this, but I think it's a pretty major mistake to not at least have a model like this sandboxed in one's head and check it regularly.

Full-cynical model of the AI safety ecosystem right now:

There’s OpenAI, which is pretending that it’s going to have full AGI Any Day Now, and relies on that narrative to keep the investor cash flowing in while they burn billions every year, losing money on every customer and developing a product with no moat. They’re mostly a hype machine, gaming metrics and cherry-picking anything they can to pretend their products are getting better. The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.
Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Then there’s a significant contingent of academics who pretend to produce technical research on AI safety, but in fact mostly view their job as producing technical propaganda for the regulation activists and lobbyists. (Central example: Dan Hendrycks, who is the one person I directly name mainly because I expect he thinks of himself as a propagandist and will not be particularly offended by that description.) They also push the narrative, and benefit from it. They’re all busy bullshitting research. Some of them are quite competent propagandists though.
There’s another significant contingent of researchers (some at the labs, some independent, some academic) who aren’t really propagandists, but mostly follow the twitter-memetic incentive gradient in choosing their research. This tends to generate paper titles which sound dramatic, but usually provide pretty little conclusive evidence of anything interesting upon reading the details, and very much feed the narrative. This is the main domain of Not Measuring What You Think You Are Measuring [LW · GW] and Symbol/Referent Confusions [LW · GW].
Then of course there’s the many theorists who like to build neat toy models which are completely toy and will predictably not generalize useful to real-world AI applications. This is the main domain of Ad-Hoc Mathematical Definitions [LW · GW], the theorists’ analogue of Not Measuring What You Think You Are Measuring.
Benchmarks. When it sounds like a benchmark measures something reasonably challenging, it nearly-always turns out that it’s not really measuring the challenging thing, and the actual questions/tasks are much easier than the pitch would suggest. (Central examples: software eng, GPQA, frontier math.) Also it always turns out that the LLMs’ supposedly-impressive achievement relied much more on memorization of very similar content on the internet than the benchmark designers expected.
Then there’s a whole crowd of people who feel real scared about AI (whether for good reasons or because they bought the Narrative pushed by all the people above). They mostly want to feel seen and validated in their panic. They have discussions and meetups and stuff where they fake doing anything useful about the problem, while in fact they mostly just emotionally vibe with each other. This is a nontrivial chunk of LessWrong content, as e.g. Val correctly-but-antihelpfully pointed out [LW · GW]. It's also the primary motivation behind lots of "strategy" work, like e.g. surveying AI researchers about their doom probabilities, or doing timeline forecasts/models.

… and of course none of that means that LLMs won’t reach supercritical self-improvement, or that AI won’t kill us, or [...]. Indeed, absent the very real risk of extinction, I’d ignore all this fakery and go about my business elsewhere. I wouldn’t be happy about it, but it wouldn’t bother me any more than all the (many) other basically-fake fields out there.

Man, I really just wish everything wasn’t fake all the time.

Replies from: louis-wenger, niplav, katalina-hernandez, uugr, Thane Ruthenis, charbel-raphael-segerie, Veedrac, Josephm, Kajus, wonder

↑ comment by LWLW (louis-wenger) · 2025-04-14T18:41:26.543Z · LW(p) · GW(p)

What makes you confident that AI progress has stagnated at OpenAI? If you don’t have the time to explain why I understand, but what metrics over the past year have stagnated?

↑ comment by niplav · 2025-04-14T18:08:00.692Z · LW(p) · GW(p)

Could you name three examples of people doing non-fake work? Since towardsness to non-fake work is easier to use for aiming than awayness from fake work.

Replies from: johnswentworth, tailcalled

↑ comment by johnswentworth · 2025-04-14T18:22:52.871Z · LW(p) · GW(p)

Chris Olah and Dan Murfet in the at-least-partially empirical domain. Myself in the theory domain, though I expect most people (including theorists) would not know what to look for to distinguish fake from non-fake theory work. In the policy domain, I have heard that Microsoft's lobbying team does quite non-fake work (though not necessarily in a good direction). In the capabilities domain, DeepMind's projects on everything except LLMs (like e.g. protein folding, or that fast matrix multiplication paper) seem consistently non-fake, even if they're less immediately valuable than they might seem at first glance. Also Conjecture seems unusually good at sticking to reality across multiple domains.

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2025-04-23T02:19:59.520Z · LW(p) · GW(p)

Conjecture seems unusually good at sticking to reality across multiple domains.

I do not get this impression, why do you say this?

↑ comment by tailcalled · 2025-04-14T19:48:40.851Z · LW(p) · GW(p)

The entire field is based on fears that consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency. This is basically wrong. Yes, people attempt to justify it with coherence theorems, but obviously you can be approximately-coherent/approximately-consequentialist and yet still completely un-agentic, so this justification falls flat. Since the field is based on a wrong assumption with bogus justification, it's all fake.

Replies from: steve2152, Kajus

↑ comment by Steven Byrnes (steve2152) · 2025-04-15T13:21:00.388Z · LW(p) · GW(p)

(IMO this is kinda unrelated to the OP, but I want to continue this thread.)

Have you elaborated on this anywhere?

Perhaps you missed it, but some guy in 2022 wrote this great post [LW · GW] which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)

I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?

Replies from: tailcalled

↑ comment by tailcalled · 2025-04-15T21:57:14.052Z · LW(p) · GW(p)

(IMO this is kinda unrelated to the OP, but I want to continue this thread.)

I think it's quite related to the OP. If a field is founded on a wrong assumption, then people only end up working in the field if they have some sort of blind spot, and that blind spot leads to their work being fake.

Have you elaborated on this anywhere?

Not hugely. One tricky bit is that it basically ends up boiling down to "the original arguments don't hold up if you think about them", but the exact way they don't hold up depends on what the argument is, so it's kind of hard to respond to in general.

Perhaps you missed it, but some guy in 2022 wrote this great post [LW · GW] which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)

Haha! I think I mostly still stand by the post. In particular, "Consequentialism, broadly defined, is a general and useful way to develop capabilities." remains true; it's just that intelligence relies on patterns and thus works much better on common things (which must be small, because they are fragments of a finite world), than on rare things (which can be big, though don't have to). This means that consequentialism isn't very good at developing powerful capabilities unless it works in an environment that has already been highly filtered to be highly homogenous, because an inhomogenous environment is going to BTFO the intelligence.

(I'm not sure I stand 101% by my post; there's some funky business about how to count evolution that I still haven't settled on yet. And I was too quick to go from "imitation learning isn't going to lead to far-superhuman abilities" to "consequentialism is the road to far-superhuman abilities". But yeah I'm actually surprised at how well I stand by my old view despite my massive recent updates.)

I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?

Sounds good!

Replies from: steve2152, quetzal_rainbow

↑ comment by Steven Byrnes (steve2152) · 2025-04-15T23:08:59.567Z · LW(p) · GW(p)

I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)

I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.

(I think human brains have both [partly-] consequentialist decisions and self-supervised updating of the world-model.) (They’re not totally independent, but rather they interact via training data: e.g. [partly-] consequentialist decision-making determines how you move your eyes, and then whatever your eyes are pointing at, your model of the visual world will then update by self-supervised learning on that particular data. But still, these are two systems that interact, not the same thing.)

I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.

Replies from: tailcalled

↑ comment by tailcalled · 2025-04-16T07:30:21.575Z · LW(p) · GW(p)

I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.

This I'd dispute. If your model if underparameterized (which I think is true for the typical model?), then it can't learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can't learn any pattern that never occurs in the data.

I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.

I'm saying that intelligence is the thing that allows you to handle patterns. So if you've got a dataset, intelligence allows you to build a model that makes predictions for other data based on the patterns it can find in said dataset. And if you have a function, intelligence allows you to find optima for said function based on the patterns it can find in said function.

Consequentialism is a way to set up intelligence to be agent-ish. This often involves setting up something that's meant to build an understanding of actions based on data or experience.

One could in principle cut my definition of consequentialism up into self-supervised learning and true consequentialism (this seems like what you are doing..?). One disadvantage with that is that consequentialist online learning is going to have a very big effect on the dataset one ends up training the understanding on, so they're not really independent of each other. Either way that just seems like a small labelling thing to me.

Replies from: steve2152

↑ comment by Steven Byrnes (steve2152) · 2025-04-16T11:38:46.151Z · LW(p) · GW(p)

If your model if underparameterized (which I think is true for the typical model?), then it can't learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can't learn any pattern that never occurs in the data.

Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.

I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.

Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition” [LW(p) · GW(p)], but rather with intelligence.

Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.

Replies from: tailcalled, tailcalled

↑ comment by tailcalled · 2025-04-16T12:19:53.088Z · LW(p) · GW(p)

Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns,

I guess to add, I'm not talking about unknown unknowns. Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can't efficiently be derived from empirical data (except essentially by copying someone else's conclusion blindly, and that leaves you vulnerable to deception).

↑ comment by tailcalled · 2025-04-16T12:03:53.342Z · LW(p) · GW(p)

Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.

I don't have time to read this study in detail until later today, but if I'm understanding it correctly, the study isn't claiming that neural networks will learn rare important patterns in the data, but rather that they will learn rare patterns that they were recently trained on. So if you continually train on data, you will see a gradual shift towards new patterns and forgetting old ones.

I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.

Random street names aren't necessarily important though? Like what would you do with them?

Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.

I didn't say that intelligence can't handle different environments, I said it can't handle heterogenous environments. The moon is nearly a sterile sphere in a vacuum; this is very homogenous, to the point where pretty much all of the relevant patterns can be found or created on Earth. It would have been more impressive if e.g. the USA could've landed a rocket with a team of Americans in Moscow than on the moon.

Also people did use durability, strength, healing, intuition and tradition to go the moon. Like with strength, someone had to build the rockets (or build the machines which built the rockets). And without durability and healing, they would have been damaged too much in the process of doing that. Intuition and healing are harder to clearly attribute, but they're part of it too.

Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.

Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that's less clear (and possibly not simple enough to be assembled manually, idk).

Margins of error and backup systems would be, idk, caution? Which, yes, definitely benefit from intelligence and consequentialism. Like I'm not saying intelligence and consequentialism are useless, in fact I agree that they are some of the most commonly useful things due to the frequent need to bypass common obstacles.

Replies from: steve2152

↑ comment by Steven Byrnes (steve2152) · 2025-04-16T12:54:37.095Z · LW(p) · GW(p)

Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that's less clear (and possibly not simple enough to be assembled manually, idk).

Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t. There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.

I do think there’s a “something else” (most [but not all] humans have an innate drive to follow and enforce social norms, more or less), but I don’t think it’s necessary. The Wright Brothers didn’t have any innate drive to copy anything about bird soaring tradition, but they did it anyway purely by intelligence.

Random street names aren't necessarily important though?

I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?

Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can't efficiently be derived from empirical data (except essentially by copying someone else's conclusion blindly, and that leaves you vulnerable to deception).

I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.

This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.

(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)

I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?

Replies from: tailcalled

↑ comment by tailcalled · 2025-04-16T21:11:50.513Z · LW(p) · GW(p)

Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t.

I think the necessity of intelligence for tradition exists on a much more fundamental level than that. Intelligence allows people to from an extremely rich model of the world with tons of different concepts. If one had no intelligence at all, one wouldn't even be able to copy the traditions. Like consider a collection of rocks or a forest; it can't pass any tradition onto itself.

But conversely, just as intelligence cannot be converted into powerful agency, I don't think it can be used to determine which traditions should be copied and which ones shouldn't.

There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.

It seems to me that you are treating any variable attribute that's highly correlated across generations as a "tradition", to the point where not doing something is considered on the same ontological level as doing something. That is the sort of ontology that my LDSL series [? · GW] is opposed to.

I'm probably not the best person to make the case for tradition as (despite my critique of intelligence) I'm still a relatively strong believer in equillibration and reinvention.

I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?

Whenever there's any example of this that's too embarrassing or too big of an obstacle for applying them in a wide range of practical applications, a bunch of people point it out, and they come up with a fix that allows the LLMs to learn it.

The biggest class of relevant examples would all be things that never occur in the training data - e.g. things from my job, innovations like how to build a good fusion reactor, social relationships between the world's elites, etc.. Though I expect you feel like these would be "cheating", because it doesn't have a chance to learn them?

The things in question often aren't things that most humans have a chance to learn, or even would benefit from learning. Often it's enough if just 1 person realizes and handles them, and alternately often if nobody handles them then you just lose whatever was dependent on them. Intelligence is a universal way to catch on to common patterns; other things than common patterns matter too, but there's no corresponding universal solution.

I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)

You ran way deeper into the "except essentially by copying someone else's conclusion blindly, and that leaves you vulnerable to deception" point than I meant you to. My main point is that humans have grounding on important factors that we've acquired through non-intelligence-based means. I bring up the possibility of copying other's conclusions because for many of those factors, LLMs still have access to this via copying them.

It might be helpful to imagine what it would look like if LLMs couldn't copy human insights. For instance, imagine if there was a planet with life much like Earth's, but with no species that were capable of language. We could imagine setting up a bunch of cameras or other sensors on the planet and training a self-supervised learning algorithm on them. They could surely learn a lot about the world that way - but it also seems like they would struggle with a lot of things. The exact things they would struggle with might depend a lot on how much prior your build into the algorithm, and how dynamic the sensors are, and whether there's also ways for it to perform interventions upon the planet. But for instance even recognizing the continuity of animal lives as they wander off the screen would either require a lot of prior knowledge built in to the algorithm, or a very powerful learning algorithm (e.g. Solomonoff induction can use a simplicity prior to infer that there must be an entire planet full of animals off-screen, but that's computationally intractable).

(Also, again you still need to distinguish between "Is intelligence a useful tool for bridging lots of common gaps that other methods cannot handle?" vs "Is intelligence sufficient on its own to detect deception?". My claim is that the the answer to the former is yes and the latter is no. To detect deception, you don't just use intelligence but also other facets of human agency.)

I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?

First, some things that might seem like nitpicks but are moderately important to my position:

In many ways, our modern world is much less heterogeneous than the past. For instance thanks to improved hygeine, we are exposed to far fewer diseases, and thanks to improved policing/forensics, we are exposed to much less violent crime. International trade allows us to average away troubles with crop failures. While distribution shifts generically should make it harder for humans to survive, they can (especially if made by humans) make it easier to survive.
Humans do not in fact survive; our average lifespan is less than 100 years. Humanity as a species survives by birthing, nurturing, and teaching children, and by collaborating with each other. My guess would be that aging is driven to a substantial extent by heterogeneity (albeit perhaps endogenous heterogeneity?) that hasn't been protected against. (I'm aware of John Wentworth's 'gears of aging' series arguing that aging has a common cause, but I've come to think that his arguments don't sufficiently much distinguish between 'is eventually mediated by a common cause' vs 'is ultimately caused by a common cause'. By analogy, computer slowdowns may be said to be attributable to a small number of causes like CPU exhaustion, RAM exhaustion, network bandwidth exhaustion, etc., but these are mediators and the root causes will typically be some particular program that is using up those resources, and there's a huge number of programs in the world which could be to blame depending on the case.)
We actually sort of are in a precarious situation? The world wars were unprecedentedly bloody. They basically ended because of the invention of nukes, which are so destructive that we avoid using them in war. But I don't think we actually have a robust way to avoid that?

But more fundamentally, my objection to this question is that I doubt the meaningfulness of a positive theory of how humans survive and thrive. "Intelligence" and "consequentialism" are fruitful explanations of certain things because they can be fairly-straightforwardly constructed, have fairly well-characterizable properties, and even can be fairly well-localized anatomically in humans (e.g. parts of the brain).

Like one can quibble with the details of what counts as intelligence vs understanding vs consequentialism, but under the model where intelligence is about the ability to make use of patterns, you can hand a bunch of data to computer scientists and tell them to get to work untangling the patterns, and then it turns out there are some fairly general algorithms that can work on all sorts of datasets and patterns. (I find it quite plausible that we've already "achieved superhuman intelligence" in the sense that if you give both me and a transformer a big dataset that neither of us are pre-familiar with to study through, then (at least for sufficiently much data) eventually the transformer will clearly outperform me at predicting the next token.) And probably these fairly general algorithms are probably more-or-less the same sort of thing that much of the human brain is doing.

Thus "intelligence" factors out relatively nicely as a concept that can be identified as a major contributor to human success (I think intelligence is the main reason humans outperformed other primates). But this does not mean that the rest of human success can equally well be factored out into a small number of nicely attributable and implementable concepts. (Like, some of it probably can, but there's not as much reason to presume that all of it can. "Durability" and "strength" are examples of things that fairly well can, and indeed we have definitely achieved far-superhuman strength. These are purely physical though, whereas a lot of the important stuff has a strong cognitive element to it - though I suspect it's not purely cognitive...)

Replies from: steve2152

↑ comment by Steven Byrnes (steve2152) · 2025-04-18T18:50:18.853Z · LW(p) · GW(p)

OK, here’s my argument that, if you take {intelligence, understanding, consequentialism} as a unit, it’s sufficient for everything:

If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
- Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
- After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
- If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
If reducing heterogeneity is helpful, then {intelligence, understanding, consequentialism} can discover that fact, and figure out how to reduce heterogeneity.
Etc.

Replies from: tailcalled, tailcalled

↑ comment by tailcalled · 2025-04-20T11:06:37.748Z · LW(p) · GW(p)

Writing the part that I didn't get around to yesterday:

You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It'd be a massive technical challenge of course, because atoms don't really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.

This doesn't really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can't assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.

To reverse-engineer people in order to make AI, you'd instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.

However, there's just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there's lots of reason to think humans are primarily adapted to those.

One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.

The above is similar to how we don't worry so much about 'website misalignment' because generally there's a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn't have to be true, in the sense that there are many short programs with behavior that's not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don't know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.

(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won't lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)

↑ comment by tailcalled · 2025-04-19T22:18:15.718Z · LW(p) · GW(p)

After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!

I've grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it's much more powerful than individual intelligence (whether natural or artificial).

Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn't meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).

Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution's information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.

(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, ... . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there's also often subniches.)

And then obviously beyond these points, individual intelligence and evolution focus on different things - what's happening recently vs what's happened deep in the past. Neither are perfect; society has changed a lot, which renders what's happened deep in the past less relevant than it could have been, but at the same time what's happening recently (I argue) intrinsically struggles with rare, powerful factors.

If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.

Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don't have any good way of knowing which of these are the important ones or not.

You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)

The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate "small-scale" understanding (like an autoregressive convolutional model to predict next time given previous time) into "large-scale" understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I've studied a bunch of different approaches for that, and ultimately it doesn't really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)

If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.

First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn't develop.

Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don't want money tied up into durability or strength that you're not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent - and as a consequence, those people would then gain more agency.)

Also, I do get the impression you are overestimating the feasibility of "“durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern". I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it's relatively far from falling naturally out of the methods.

One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.

(I should maybe write more but it's past midnight and also I guess I wonder how you'd respond to this.)

↑ comment by quetzal_rainbow · 2025-04-16T08:40:59.849Z · LW(p) · GW(p)

Filter for homogenity of environment is anthropic selection - if environment is sufficiently heterogeneous, it kills everyone who tries to reach out of its ecological niche, general intelligence doesn't develop and we are not here to have this conversation.

Replies from: tailcalled

↑ comment by tailcalled · 2025-04-16T09:15:48.051Z · LW(p) · GW(p)

Nah, there are other methods than intelligence for survival and success. E.g. durability, strength, healing, intuition, tradition, ... . Most of these developed before intelligence did.

Replies from: quetzal_rainbow

↑ comment by quetzal_rainbow · 2025-04-16T10:31:36.673Z · LW(p) · GW(p)

I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.

On the other hand, words like "durability" imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.

Replies from: tailcalled

↑ comment by tailcalled · 2025-04-17T09:10:21.126Z · LW(p) · GW(p)

I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.

We don't just use intelligence.

On the other hand, words like "durability" imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.

???

Vaporization is prevented by outer space which drains away energy.

Not clear why you say durability implies intelligence, surely trees are durable without intelligence.

Replies from: quetzal_rainbow

↑ comment by quetzal_rainbow · 2025-04-17T13:29:16.470Z · LW(p) · GW(p)

I feel like I'm failing to convey the level of abstraction I intend to.

I'm not saying that durability of object implies intelligence of object. I'm saying that if the world is ordered in a way that allows existence of distinct durable and non-durable objects, that means the possibility of intelligence which can notice that some objects are durable and some are not and exploit this fact.

If the environment is not ordered enough to contain intelligent beings, it's probably not ordered enough to contain distinct durable objects too.

To be clear, by "environment" I mean "the entire physics". When I say "environment not ordered enough" I mean "environment with physical laws chaotic enough to not contain ordered patterns".

Replies from: tailcalled

↑ comment by tailcalled · 2025-04-17T15:05:02.568Z · LW(p) · GW(p)

It seems like you are trying to convince me that intelligence exists, which is obviously true and many of my comments rely on it. My position is simply that consequentialism cannot convert intelligence into powerful agency, it can only use intelligence to bypass common obstacles.

Replies from: quetzal_rainbow

↑ comment by quetzal_rainbow · 2025-04-18T19:17:16.841Z · LW(p) · GW(p)

No, my point is that in worlds where intelligence is possible, almost all obstacles are common.

Replies from: tailcalled

↑ comment by tailcalled · 2025-04-19T18:33:15.724Z · LW(p) · GW(p)

If there's some big object, then it's quite possible for it to diminish into a large number of similar obstacles, and I'd agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.

However, my assertion wasn't that intelligence cannot handle almost all obstacles, it was that consequentialism can't convert intelligence into powerful agency. It's enough for there to be rare powerful obstacles in order for this to fail.

↑ comment by Kajus · 2025-04-15T07:12:42.130Z · LW(p) · GW(p)

I don't think this is the claim that the post is making but still makes sense to me. The post is saying something opposite, that the people working on the field are not doing prioritization right and so on or not thinking clearly about things while the risk is real

Replies from: tailcalled

↑ comment by tailcalled · 2025-04-15T07:18:01.877Z · LW(p) · GW(p)

I'm not trying to present johnswentworth's position, I'm trying to present my position.

↑ comment by Katalina Hernandez (katalina-hernandez) · 2025-04-15T10:51:55.802Z · LW(p) · GW(p)

I do not necessarily disagree with this, coming from a legal / compliance background. If you see any of my profiles, I constantly complain about "performative compliance" and "compliance theatre". Painfully present across the legal and governance sectors.

That said: can you provide examples of activism or regulatory efforts that you do agree with? What does a "non fake" regulatory effort look like?

I don't think it would be okay to dismiss your take entirely, but it would be great to see what solutions you'd propose too. This is why I disagree in principle, because there are no specific points to contribute to.

In Europe, paradoxically, some of the people "close enough to the bureaucracy" that pushed for the AI Act to include GenAI providers, were OpenAI-adjacent.

But I will rescue this:

"(b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI"

BigTech is too powerful to lobby against. "Stopping advanced AI" per se would contravene many market regulations (unless we define exactly what you mean by advanced AI and the undeniable dangers to people's lives). Regulators can only prohibit development of products up to certain point. They cannot just decide to "stop" development of technologies arbitrarily. But the AI Act does prohibit many types of AI systems already: Article 5: Prohibited AI Practices | EU Artificial Intelligence Act.

Those are considered to create unacceptable risks to people's lives and human rights.

Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.

↑ comment by uugr · 2025-04-15T15:22:26.858Z · LW(p) · GW(p)

"The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI."

This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another:

If the core products aren't really improving, the progress measured on benchmarks is fake. But if they are, the benchmarks are an (imperfect but still real) attempt to quantify that real improvement.
If LLMs are stagnating, all the people generating dramatic-sounding papers for each new SOTA are just maintaining a holding pattern. But if they're changing, then just studying/keeping up with the general properties of that progress is real. Same goes for people building and regularly updating their toy models of the thing.
Similarly, if the progress is fake, the propaganda signal-boosting that progress is also fake. If it isn't, it isn't. (At least directionally; a lot of that propaganda is still probably exaggerated.)
If the above three are all fake, all the people who feel real scared and want to be validated are stuck in a toxic emotional dead-end where they constantly freak out over fake things to no end. But if they're responding to legitimate, persistent worldview updates, having a space to vibe them out with like-minded others seems important.

So, in deciding whether or not to endorse this narrative, we'd like to know whether or not the models really ARE stagnating. What makes you think the appearance of progress here is illusory?

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-04-16T15:39:10.589Z · LW(p) · GW(p)

This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another [...]

Nope!

Even if the base models are improving, it can still be true that most of the progress measured on the benchmarks is fake, and has basically-nothing to do with the real improvements.

Even if the base models are improving, it can still be true that the dramatic sounding papers and toy models are fake, and have basically-nothing to do with the real improvements.

Even if the base models are improving, the propaganda about it can still be overblown and mostly fake, and have basically-nothing to do with the real improvements.

Even if the base models are improving, the people who feel real scared and just want to be validated can still be doing fake work and in fact be mostly useless, and their dynamic can still have basically-nothing to do with the real improvements.

Just because the base models are in fact improving does not mean that all this other stuff is actually coupled to the real improvement.

Replies from: uugr

↑ comment by uugr · 2025-04-19T13:28:46.496Z · LW(p) · GW(p)

Sounds like you're suggesting that real progress could be orthogonal to human-observed progress. I don't see how this is possible. Human-observed progress is too broad.

The collective of benchmarks, dramatic papers and toy models, propaganda, and doomsayers are suggesting the models are simultaneously improving at: writing code, researching data online, generating coherent stories, persuading people of things, acting autonomously without human intervention, playing Pokemon, playing Minecraft, playing chess, aligning to human values, pretending to align to human values, providing detailed amphetamine recipes, refusing to provide said recipes, passing the Turing test, writing legal documents, offering medical advice, knowing what they don't know, being emotionally compelling companions, correctly guessing the true authors of anonymous text, writing papers, remembering things, etc, etc.

They think all these improvements are happening at the same time in vastly different domains because they're all downstream of the same task, which is text prediction. So, they're lumped together in the general domain of 'capabilities', and call a model which can do all of them well a 'general intelligence'. If the products are stagnating, sure, all those perceived improvements could be bullshit. (Big 'if'!) But how could the models be 'improving' without improving at any of these things? What domains of 'real improvement' exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?

Replies from: gwern

↑ comment by gwern · 2025-04-19T23:04:03.180Z · LW(p) · GW(p)

What domains of 'real improvement' exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?

As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?

correctly guessing the true authors of anonymous text

See, this is exactly the example I would have given: truesight is an obvious example of a domain of real improvement which appears on no benchmarks I am aware of, but which appears to correlate strongly with the pretraining loss, is not applied anywhere (I hope), is unobvious that LLMs might do it and the capability does not naturally reveal itself in any standard use-cases (which is why people are shocked when it surfaces), and it would have been easy for no one to have observed it up until now or dismissed it, and even now after a lot of publicizing (including by yours truly), only a few weirdos know much about it.

Why can't there be plenty of other things like inner-monologue or truesight? ("Wait, you could do X? Why didn't you tell us?" "You never asked.")

What domains of 'real improvement' exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?

Maybe a better example would be to point out that 'emergent' tasks in general, particularly multi-step tasks, can have observed success rates of precisely 0 in feasible finite samples, but extreme brute-force sampling reveals hidden scaling. Humans would perceive zero improvement as the models scaled (0/100 = 0%, 0/100 = 0%, 0/100 = 0%...), even though they might be rapidly improving from 1/100,000 to 1/10,000 to 1/1,000 to... etc. "Sampling can show the presence of knowledge but not the absence."

Replies from: uugr

↑ comment by uugr · 2025-04-21T20:47:46.549Z · LW(p) · GW(p)

As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?

Oops, yes. I was thinking "domains of real improvement which humans are currently perceiving in LLMs", not "domains of real improvement which humans are capable of perceiving in general". So a capability like inner-monologue or truesight, which nobody currently knows about, but is improving anyway, would certainly qualify. And the discovery of such a capability could be 'real' even if other discoveries are 'fake'.

That said, neither truesight nor inner-monologue seem uncoupled to the more common domains of improvement, as measured in benchmarks and toy models and people-being-scared. The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance. Truesight is narrower, but at the very least we'd expect it to correlate with skill in the common "write [x] in the style of [y]" prompt, right? Surely the same network of associations which lets it accurately generate "Eliezer Yudkowsky wrote this" after a given set of tokens, would also be useful for accurately finishing a sentence starting with "Eliezer Yudkowksy says...".

So I still wouldn't consider these things to have basically-nothing to do with commonly perceived domains of improvement.

↑ comment by Thane Ruthenis · 2025-04-17T01:05:36.453Z · LW(p) · GW(p)

46 agreement/diasgreement-votes, 0 net agreement score

Gotta love how much of a perfect Scissor statement this is. (Same as my "o3 is not that impressive" [LW(p) · GW(p)].)

↑ comment by Charbel-Raphaël (charbel-raphael-segerie) · 2025-04-15T07:25:35.781Z · LW(p) · GW(p)

Then there’s the AI regulation activists and lobbyists. [...] Even if they do manage to pass any regulations on AI, those will also be mostly fake

SB1047 was a pretty close shot to something really helpful. The AI Act and its code of practice might be insufficient, but there are good elements in it that, if applied, would reduce the risks. The problem is that it won't be applied because of internal deployment.

But I sympathise somewhat with stuff like this:

They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People.

Replies from: johnswentworth, katalina-hernandez

↑ comment by johnswentworth · 2025-04-15T16:00:09.020Z · LW(p) · GW(p)

SB1047 was a pretty close shot to something really helpful.

No, it wasn't. It was a pretty close shot to something which would have gotten a step closer to another thing, which itself would have gotten us a step closer to another thing, which might have been moderately helpful at best.

Replies from: charbel-raphael-segerie

↑ comment by Charbel-Raphaël (charbel-raphael-segerie) · 2025-04-16T08:59:12.382Z · LW(p) · GW(p)

You really think those elements are not helpful? I'm really curious

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-04-16T15:28:53.234Z · LW(p) · GW(p)

Sure, they are more-than-zero helpful. Heck, in a relative sense, they'd be one of the biggest wins in AI safety to date. But alas, reality does not grade on a curve.

One has to bear in mind that the words on that snapshot do not all accurately describe reality in the world where SB1047 passes. "Implement shutdown ability" would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that. "Implement reasonable safeguards to prevent societal-scale catastrophes" would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all, because the rules for the board responsible for overseeing these things made it pretty easy for the labs to capture.

When I discussed the bill with some others at the time, the main takeaway was that the actually-substantive part was just putting any bureaucracy in place at all to track which entities are training models over 10^26 FLOP/$100M. The bill seemed unlikely to do much of anything beyond that.

Even if the bill had been much more substantive, it would still run into the standard problems of AI regulation: we simply do not have a way to reliably tell which models are and are not dangerous, so the choice is to either ban a very large class of models altogether, or allow models which will predictably be dangerous sooner or later. The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most, but definitely not slow down timelines by a factor of 10 or more.

Replies from: Thane Ruthenis, trevor, charbel-raphael-segerie

↑ comment by Thane Ruthenis · 2025-04-16T16:25:13.095Z · LW(p) · GW(p)

The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most

... or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.

How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren't "scale LLMs". Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren't really pushing the frontier today either; that wouldn't be much of a loss.

To what extent are the three AGI labs alive vs. dead players, then?

OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it's now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it's little evidence that OpenAI was still alive in 2024). But maybe not.
Anthropic houses a bunch of the best OpenAI researchers now, and it's apparently capable of inventing some novel tricks (whatever's the mystery behind Sonnet 3.5 and 3.6).
DeepMind is even now consistently outputting some interesting non-LLM research.

I think there's a decent chance that they're alive enough. Currently, they're busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people's attention on the potentially-doomed paradigm, if they're forced to correct the mistake (on this model) that they're making...

This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.

One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can't produce straight-line graphs suggesting godhood by 2027, and are reduced to "well we probably need a transformer-sized insight here...", it becomes much harder to generate hype and alarm that would be legible to investors and politicians.

But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? How much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish? Comparatively, how much downsizing would be caused by the chilling effect if the presumably doomed LLM paradigm is let to run its course of disappointing everyone by 2030 (when the AGI labs can scale no longer [LW(p) · GW(p)])?

On balance, upper-bounding FLOPs is probably still a positive thing to do. But I'm not really sure.

↑ comment by tlevin (trevor) · 2025-04-21T04:14:42.563Z · LW(p) · GW(p)

I disagree that the default would've been that the board would've been "easy for the labs to capture" (indeed, among the most prominent and plausible criticisms of its structure was that it would overregulate in response to political pressure), and thus that it wouldn't have changed deployment practices. I think the frontier companies were in a good position to evaluate this, and they decided to oppose the bill (and/or support it conditional on sweeping changes, including the removal of the Frontier Model Division).

Also, I'm confused when policy skeptics say things like "sure, it might slow down timelines by a factor of 2-3, big deal." Having 2-3x as much time is indeed a big deal!

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-04-21T05:33:47.496Z · LW(p) · GW(p)

Probably not going to have a discussion on the topic right now, but out of honest curiosity: did you read the bill?

↑ comment by Charbel-Raphaël (charbel-raphael-segerie) · 2025-04-17T17:45:24.609Z · LW(p) · GW(p)

I'm glad we agree "they'd be one of the biggest wins in AI safety to date."

"Implement shutdown ability" would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that

How so? It's pretty straightforward if the model is still contained in the lab.

"Implement reasonable safeguards to prevent societal-scale catastrophes" would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all

I think ticking boxes is good. This is how we went to the Moon, and it's much better to do this than to not do it. It's not trivial to tick all the boxes. Look at the number of boxes you need to tick if you want to follow the Code of Practice of the AI Act or this paper from DeepMind.

we simply do not have a way to reliably tell which models are and are not dangerous

How so? I think capabilities evaluations are much simpler than alignment evals, and at the very least we can run those. You might say: "A model might sandbag." Sure, but you can fine-tune it and see if the capabilities are recovered. If even with some fine-tuning the model is not able to do the tasks at all, modulo the problem of gradient hacking that is, I think, very unlikely, we can be pretty sure that the model wouldn't be capable of doing such feat. I think at the very least, following the same methodology as the one followed by Anthropic in their last system cards is pretty good and would be very helpful.

↑ comment by Katalina Hernandez (katalina-hernandez) · 2025-04-15T09:19:30.283Z · LW(p) · GW(p)

100% agreed @Charbel-Raphaël [LW · GW].

The EU AI Act even mentions "alignment with human intent" explicitly, as a key concern for systemic risks. This is in Recital 110 (which defines what are systemic risks and how they may affect society).

I do not think any law has mentioned alignment like this before, so it's massive already.

Will a lot of the implementation efforts feel "fake"? Oh, 100%. But I'd say that this is why we (this community) should not disengage from it...

I also get that the regulatory landscape in the US is another world entirely (which is what the OP is bringing up).

↑ comment by Veedrac · 2025-04-15T20:04:39.191Z · LW(p) · GW(p)

Your very first point is, to be a little uncharitable, ‘maybe OpenAI's whole product org is fake.’ I know you have a disclaimer here but you're talking about a product category that didn't exist 30 months ago that today has this one website now reportedly used by 10% of people in the entire world and that the internet is saying expects ~12B revenue this year.

If your vibes are towards investing in that class of thing being fake or ‘mostly a hype machine’ then your vibes are simply not calibrated well in this domain.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-04-15T20:07:57.359Z · LW(p) · GW(p)

No, the model here is entirely consistent with OpenAI putting out some actual cool products. Those products (under the model) just aren't on a path to AGI, and OpenAI's valuation is very much reliant on being on a path to AGI in the not-too-distant future. It's the narrative about building AGI which is fake.

Replies from: Lblack, Veedrac

↑ comment by Lucius Bushnaq (Lblack) · 2025-04-15T22:39:59.056Z · LW(p) · GW(p)

OpenAI's valuation is very much reliant on being on a path to AGI in the not-too-distant future.

Really? I'm mostly ignorant on such matters, but I'd thought that their valuation seemed comically low compared to what I'd expect if their investors thought that OpenAI was likely to create anything close to a general superhuman AI system in the near future.^[1] I considered this evidence that they think all the AGI/ASI talk is just marketing.

^{^}
Well ok, if they actually thought OpenAI would create superintelligence as I think of it, their valuation would plummet because giving people money to kill you with is dumb. But there's this space in between total obliviousness and alarm, occupied by a few actually earnest AI optimists. And, it seems to me, not occupied by the big OpenAI investors.

Replies from: Veedrac

↑ comment by Veedrac · 2025-04-16T08:38:07.161Z · LW(p) · GW(p)

Consider, in support: Netflix has a $418B market cap. It is inconsistent to think that a $300B valuation for OpenAI or whatever's in the news requires replacing tens of trillions of dollars of capital before the end of the decade.

Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances, consider that just a year ago the same argument would have discredited how they are valued today, and a year before that would have discredited where they were a year ago, and so forth. This holds similarly for historic busts in other companies. Investor sentiment is informational but clearly isn't definitive, else stocks would never change rapidly.

Replies from: Lblack

↑ comment by Lucius Bushnaq (Lblack) · 2025-04-16T09:24:42.080Z · LW(p) · GW(p)

Similarly, for people wanting to argue from the other direction, who might think a low current valuation is case-closed evidence against their success chances

To be clear: I think the investors would be wrong to think that AGI/ASI soon-ish isn't pretty likely.

↑ comment by Veedrac · 2025-04-15T20:11:09.533Z · LW(p) · GW(p)

But most of your criticisms in the point you gave have ~no bearing on that? If you want to make a point about how effectively OpenAI's research moves towards AGI you should be saying things relevant to that, not giving general malaise about their business model.

Or, I might understand ‘their business model is fake which implies a lack of competence about them broadly,’ but then I go back to the whole ‘10% of people in the entire world’ and ‘expects 12B revenue’ thing.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-04-15T20:23:28.888Z · LW(p) · GW(p)

The point of listing the problems with their business model is that they need the AGI narrative in order to fuel the investor cash, without which they will go broke at current spend rates. They have cool products, they could probably make a profit if they switched to optimizing for that (which would mean more expensive products and probably a lot of cuts), but not anywhere near the level of profits they'd need to justify the valuation.

Replies from: Veedrac

↑ comment by Veedrac · 2025-04-15T20:37:49.087Z · LW(p) · GW(p)

That's how I interpreted it originally; you were arguing their product org vibed fake, I was arguing your vibes were miscalibrated. I'm not sure what to say to this that I didn't say originally.

↑ comment by Joseph Miller (Josephm) · 2025-04-15T00:51:03.664Z · LW(p) · GW(p)

Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People.

The activists and the lobbyists are two very different groups. The activists are not trying to network with the DC people (yet). Unless you mean Encode, who I would call lobbyists, not activists.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-04-15T03:45:31.634Z · LW(p) · GW(p)

Good point, I should have made those two separate bullet points:

Then there’s the AI regulation lobbyists. They lobby and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.
Also, there's the AI regulation activists, who e.g. organize protests. Like ~98% of protests in general, such activity is mostly performative and not the sort of thing anyone would end up doing if they were seriously reasoning through how best to spend their time in order to achieve policy goals. Calling it "fake" feels almost redundant. Insofar as these protests have any impact, it's via creating an excuse for friendly journalists to write stories about the dangers of AI (itself an activity which mostly feeds the narrative, and has dubious real impact).

(As with the top level, epistemic status: I don't fully endorse all this, but I think it's a pretty major mistake to not at least have a model like this sandboxed in one's head and check it regularly.)

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2025-04-15T04:31:39.082Z · LW(p) · GW(p)

Oh, if you're in the business of compiling a comprehensive taxonomy of ways the current AI thing may be fake, you should also add:

Vibe coders and "10x'd engineers", who (on this model) would be falling into one of the failure modes outlined here [LW · GW]: producing applications/features that didn't need to exist, creating pointless code bloat (which helpfully show up in productivity metrics like "volume of code produced" or "number of commits"), or "automatically generating" entire codebases in a way that feels magical, then spending so much time bugfixing them it eats up ~all perceived productivity gains.
e/acc and other Twitter AI fans, who act like they're bleeding-edge transhumanist visionaries/analysts/business gurus/startup founders, but who are just shitposters/attention-seekers who will wander off and never look back the moment the hype dies down.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-04-15T16:09:29.941Z · LW(p) · GW(p)

True, but I feel a bit bad about punching that far down.

↑ comment by Kajus · 2025-04-15T19:02:22.488Z · LW(p) · GW(p)

What are the other basically-fake fields out there?

Replies from: o-o

↑ comment by O O (o-o) · 2025-04-17T07:38:13.362Z · LW(p) · GW(p)

quantum computing, nuclear fusion

↑ comment by wonder · 2025-04-18T02:48:41.644Z · LW(p) · GW(p)

I share some similar frustrations, and unfortunately these are also prevalent in other parts of the human society. The commonality of most of these fakeness seem to be impure intentions - there are impure/non-intrinsic motivations other than producing the best science/making true progress. Some of these motivations unfortunately could be based on survival/monetary pressure, and resolving that for true research or progress seems to be critical. We need to encourage a culture of pure motivations, and also equip ourselves with more ability/tools to distinguish extrinsic motivations.

comment by johnswentworth · 2024-12-24T15:52:38.074Z · LW(p) · GW(p)

On o3: for what feels like the twentieth time this year, I see people freaking out, saying AGI is upon us, it's the end of knowledge work, timelines now clearly in single-digit years, etc, etc. I basically don't buy it, my low-confidence median guess is that o3 is massively overhyped. Major reasons:

I've personally done 5 problems from GPQA in different fields and got 4 of them correct (allowing internet access, which was the intent behind that benchmark). I've also seen one or two problems from the software engineering benchmark. In both cases, when I look the actual problems in the benchmark, they are easy, despite people constantly calling them hard and saying that they require expert-level knowledge.
- For GPQA, my median guess is that the PhDs they tested on were mostly pretty stupid. Probably a bunch of them were e.g. bio PhD students at NYU who would just reflexively give up if faced with even a relatively simple stat mech question which can be solved with a couple minutes of googling jargon and blindly plugging two numbers into an equation.
- For software engineering, the problems are generated from real git pull requests IIUC, and it turns out that lots of those are things like e.g. "just remove this if-block".
- Generalizing the lesson here: the supposedly-hard benchmarks for which I have seen a few problems (e.g. GPQA, software eng) turn out to be mostly quite easy, so my prior on other supposedly-hard benchmarks which I haven't checked (e.g. FrontierMath) is that they're also mostly much easier than they're hyped up to be.
On my current model of Sam Altman, he's currently very desperate to make it look like there's no impending AI winter, capabilities are still progressing rapidly, etc. Whether or not it's intentional on Sam Altman's part, OpenAI acts accordingly, releasing lots of very over-hyped demos. So, I discount anything hyped out of OpenAI, and doubly so for products which aren't released publicly (yet).
Over and over again in the past year or so, people have said that some new model is a total game changer for math/coding, and then David will hand it one of the actual math or coding problems we're working on and it will spit out complete trash. And not like "we underspecified the problem" trash, or "subtle corner case" trash. I mean like "midway through the proof it redefined this variable as a totally different thing and then carried on as though both definitions applied". The most recent model with which this happened was o1.
- Of course I am also tracking the possibility that this is a skill issue on our part, and if that's the case I would certainly love for someone to help us do better. See this thread [LW(p) · GW(p)] for a couple examples of relevant coding tasks.
- My median-but-low-confidence guess here is that basically-all the people who find current LLMs to be a massive productivity boost for coding are coding things which are either simple, or complex only in standardized ways - e.g. most web or mobile apps. That's the sort of coding which mostly involves piping things between different APIs and applying standard patterns, which is where LLMs shine.

Replies from: Buck, Thane Ruthenis, waterlubber, mtaran, kabir-kumar

↑ comment by Buck · 2024-12-25T21:52:26.867Z · LW(p) · GW(p)

I just spent some time doing GPQA, and I think I agree with you that the difficulty of those problems is overrated. I plan to write up more on this.

Replies from: Buck, Buck

↑ comment by Buck · 2024-12-26T23:01:40.983Z · LW(p) · GW(p)

@johnswentworth [LW · GW] Do you agree with me that modern LLMs probably outperform (you with internet access and 30 minutes) on GPQA diamond? I personally think this somewhat contradicts the narrative of your comment if so.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-12-26T23:21:32.859Z · LW(p) · GW(p)

I don't know, I have not specifically tried GPQA diamond problems. I'll reply again if and when I do.

Replies from: Raemon

↑ comment by Raemon · 2024-12-27T02:38:09.708Z · LW(p) · GW(p)

I at least attempted to be filtering the problems I gave you for GPQA diamond, although I am not very confident that I succeeded.

(Update: yes, the problems John did were GPQA diamond. I gave 5 problems to a group of 8 people, and gave them two hours to complete however many they thought they could complete without getting any wrong)

Replies from: johnswentworth, Raemon

↑ comment by johnswentworth · 2024-12-27T03:03:26.644Z · LW(p) · GW(p)

@Buck Apparently the five problems I tried were GPQA diamond, they did not take anywhere near 30 minutes on average (more like 10 IIRC?), and I got 4/5 correct. So no, I do not think that modern LLMs probably outperform (me with internet access and 30 minutes).

Replies from: Buck, Raemon, Buck

↑ comment by Buck · 2024-12-27T15:24:49.772Z · LW(p) · GW(p)

Ok, so sounds like given 15-25 mins per problem (and maybe with 10 mins per problem), you get 80% correct. This is worse than o3, which scores 87.7%. Maybe you'd do better on a larger sample: perhaps you got unlucky (extremely plausible given the small sample size) or the extra bit of time would help (though it sounds like you tried to use more time here and that didn't help). Fwiw, my guess from the topics of those questions is that you actually got easier questions than average from that set.

I continue to think these LLMs will probably outperform (you with 30 mins). Unfortunately, the measurement is quite expensive, so I'm sympathetic to you not wanting to get to ground here. If you believe that you can beat them given just 5-10 minutes, that would be easier to measure. I'm very happy to bet here.

I think that even if it turns out you're a bit better than LLMs at this task, we should note that it's pretty impressive that they're competitive with you given 30 minutes!

So I still think your original post is pretty misleading [ETA: with respect to how it claims GPQA is really easy].

I think the models would beat you by more at FrontierMath.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-12-27T19:12:02.704Z · LW(p) · GW(p)

Even assuming you're correct here, I don't see how that would make my original post pretty misleading?

Replies from: Buck

↑ comment by Buck · 2024-12-27T19:54:56.673Z · LW(p) · GW(p)

I think that how you talk about the questions being “easy”, and the associated stuff about how you think the baseline human measurements are weak, is somewhat inconsistent with you being worse than the model.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-12-27T20:17:28.779Z · LW(p) · GW(p)

I mean, there are lots of easy benchmarks on which I can solve the large majority of the problems, and a language model can also solve the large majority of the problems, and the language model can often have a somewhat lower error rate than me if it's been optimized for that. Seems like GPQA (and GPQA diamond) are yet another example of such a benchmark.

Replies from: Buck

↑ comment by Buck · 2024-12-28T17:36:31.477Z · LW(p) · GW(p)

What do you mean by "easy" here?

↑ comment by Raemon · 2024-12-27T03:07:23.800Z · LW(p) · GW(p)

(my guess is you took more like 15-25 minutes per question? Hard to tell from my notes, you may have finished early but I don't recall it being crazy early)

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-12-27T04:38:33.560Z · LW(p) · GW(p)

I remember finishing early, and then spending a lot of time going back over all them a second time, because the goal of the workshop was to answer correctly with very high confidence. I don't think I updated any answers as a result of the second pass, though I don't remember very well.

↑ comment by Buck · 2024-12-27T15:11:07.172Z · LW(p) · GW(p)

↑ comment by Raemon · 2024-12-27T02:57:34.945Z · LW(p) · GW(p)

(This seems like more time than Buck was taking – the goal was to not get any wrong so it wasn't like people were trying to crank through them in 7 minutes)

The problems I gave were (as listed in the csv for the diamond problems)

#1 (Physics) (1 person got right, 3 got wrong, 1 didn't answer)
#2 (Organic Chemistry), (John got right, I think 3 people didn't finish)
#4 (Electromagnetism), (John and one other got right, 2 got wrong)
#8 (Genetics) (3 got right including John)
#10 (Astrophysics) (5 people got right)

↑ comment by Buck · 2024-12-26T16:49:41.753Z · LW(p) · GW(p)

@johnswentworth [LW · GW] FWIW, GPQA Diamond seems much harder than GPQA main to me, and current models perform well on it. I suspect these models beat your performance on GPQA diamond if you're allowed 30 mins per problem. I wouldn't be shocked if you beat them (maybe I'm like 20%?), but that's because you're unusually broadly knowledgeable about science, not just because you're smart.

I personally get wrecked by GPQA chemistry, get ~50% on GPQA biology if I have like 7 minutes per problem (which is notably better than their experts from other fields get, with much less time), and get like ~80% on GPQA physics with less than 5 minutes per problem. But GPQA Diamond seems much harder.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-12-26T19:04:56.202Z · LW(p) · GW(p)

Is this with internet access for you?

Replies from: Buck

↑ comment by Buck · 2024-12-26T19:36:03.815Z · LW(p) · GW(p)

Yes, I'd be way worse off without internet access.

↑ comment by Thane Ruthenis · 2024-12-24T16:12:00.340Z · LW(p) · GW(p)

Generalizing the lesson here: the supposedly-hard benchmarks for which I have seen a few problems (e.g. GPQA, software eng) turn out to be mostly quite easy, so my prior on other supposedly-hard benchmarks which I haven't checked (e.g. FrontierMath) is that they're also mostly much easier than they're hyped up to be

Daniel Litt's account here supports this prejudice. As a math professor, he knew instantly how to solve the low/medium-level problems he looked at, and he suggests that each "high"-rated problem would be likewise instantly solvable by an expert in that problem's subfield.

And since LLMs have eaten ~all of the internet, they essentially have the crystallized-intelligence skills for all (sub)fields of mathematics (and human knowledge in general). So from their perspective, all of those problems are very "shallow". No human shares their breadth of knowledge, so math professors specialized even in slightly different subfields would indeed have to do a lot of genuine "deep" cognitive work; this is not the case for LLMs.

GPQA stuff is even worse, a literal advanced trivia quiz that seems moderately resistant to literal humans literally googling things, but not to the way the knowledge gets distilled into LLMs.

Basically, I don't think any extant benchmark (except I guess the Millennium Prize Eval) actually tests "deep" problem-solving skills, in a way LLMs can't cheat at using their overwhelming knowledge breadth.

My current strong-opinion-weakly-held is that they're essentially just extensive knowledge databases with a nifty natural-language interface on top.^[1] All of the amazing things they do should be considered surprising facts about how far this trick can scale; not surprising facts about how close we are to AGI.

^{^}
Which is to say: this is the central way to characterize what they are; not merely "isomorphic to a knowledge database with a natural-language search engine on top if you think about them in a really convoluted way". Obviously a human can also be considered isomorphic to database search if you think about it in a really convoluted way, but that wouldn't be the most-accurate way to describe a human.

Replies from: jarviniemi, sharmake-farah, notfnofn

↑ comment by Olli Järviniemi (jarviniemi) · 2024-12-25T21:33:28.218Z · LW(p) · GW(p)

[...] he suggests that each "high"-rated problem would be likewise instantly solvable by an expert in that problem's subfield.

This is an exaggeration and, as stated, false.

Epoch AI made 5 problems from the benchmark public. One of those was ranked "High", and that problem was authored by me.

It took me 20-30 hours to create that submission. (To be clear, I considered variations of the problem, ran into some dead ends, spent a lot of time carefully checking my answer was right, wrote up my solution, thought about guess-proof-ness^[1] etc., which ate up a lot of time.)
I would call myself an "expert in that problem's subfield" (e.g. I have authored multiple related papers).
I think you'd be very hard-pressed to find any human who could deliver the correct answer to you within 2 hours of seeing the problem.
- E.g. I think it's highly likely that I couldn't have done that (I think it'd have taken me more like 5 hours), I'd be surprised if my colleagues in the relevant subfield could do that, and I think the problem is specialized enough that few of the top people in CodeForces or Project Euler could do it.

On the other hand, I don't think the problem is very hard insight-wise - I think it's pretty routine, but requires care with details and implementation. There are certainly experts who can see the right main ideas quickly (including me). So there's something to the point of even FrontierMath problems being surprisingly "shallow". And as is pointed out in the FM paper, the benchmark is limited to relatively short-scale problems (hours to days for experts) - which really is shallow, as far as the field of mathematics is concerned.

But it's still an exaggeration to talk about "instantly solvable". Of course, there's no escaping of Engel's maxim "A problem changes from impossible to trivial if a related problem was solved in training" - I guess the problem is instantly solvable to me now... but if you are hard-pressed to find humans that could solve it "instantly" when seeing it the first time, then I wouldn't describe it in those terms.

Also, there are problems in the benchmark that require more insight than this one.

^{^}
Daniel Litt writes about the problem: "This one (rated "high") is a bit trickier but with no thinking at all (just explaining what computation I needed GPT-4o to do) I got the first 3 digits of the answer right (the answer requires six digits, and the in-window python timed out before it could get this far)
Of course *proving* the answer to this one is correct is harder! But I do wonder how many of these problems are accessible to simulation/heuristics. Still an immensely useful tool but IMO people should take a step back before claiming mathematicians will soon be replaced".
I very much considered naive simulations and heuristics. The problem is getting 6 digits right, not 3. (The AIs are given a limited compute budget.) This is not valid evidence in favor of the problem's easiness or for the benchmark's accessibility to simulation/heuristics - indeed, this is evidence in the opposing direction.
See also Evan Chen's "I saw the organizers were pretty ruthless about rejecting problems for which they felt it was possible to guess the answer with engineer's induction."

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2024-12-25T23:15:55.481Z · LW(p) · GW(p)

Thanks, that's important context!

And fair enough, I used excessively sloppy language. By "instantly solvable", I did in fact mean "an expert would very quickly ("instantly") see the correct high-level approach to solving it, with the remaining work being potentially fiddly, but conceptually straightforward". "Instantly solvable" in the sense of "instantly know how to solve"/"instantly reducible to something that's trivial to solve".^[1]

Which was based on this quote of Litt's:

FWIW the "medium" and "low" problems I say I immediately knew how to do are very close to things I've thought about; the "high"-rated problem above is a bit further, and I suspect an expert closer to it would similarly "instantly" know the answer.

That said,

if you are hard-pressed to find humans that could solve it "instantly" when seeing it the first time, then I wouldn't describe it in those terms

If there are no humans who can "solve it instantly" (in the above sense), then yes, I wouldn't call it "shallow". But if such people do exist (even if they're incredibly rare), this implies that the conceptual machinery (in the form of theorems or ansatzes) for translating the problem into a trivial one already exists as well. Which, in turn, means it's likely present in the LLM's training data. And therefore, from the LLM's perspective, that problem is trivial to translate into a conceptually trivial problem.

It seems you'd largely agree with that characterization?

Note that I'm not arguing that LLMs aren't useful or unimpressive-in-every-sense. This is mainly an attempt to build a model of why LLMs seem to perform so well on apparently challenging benchmarks while reportedly falling flat on their faces on much simpler real-life problems.

^{^}
Or, closer to the way I natively think of it: In the sense that there are people (or small teams of people) with crystallized-intelligence skillsets such that they would be able to solve this problem by plugging their crystallized-intelligence skills one into another, without engaging in prolonged fluid-intelligence problem-solving.

Replies from: jarviniemi

↑ comment by Olli Järviniemi (jarviniemi) · 2024-12-26T00:42:19.634Z · LW(p) · GW(p)

This looks reasonable to me.

It seems you'd largely agree with that characterization?

Yes. My only hesitation is about how real-life-important it's for AIs to be able to do math for which very-little-to-no training data exists. The internet and the mathematical literature is so vast that, unless you are doing something truly novel, there's some relevant subfield there - in which case FrontierMath-style benchmarks would be informative of capability to do real math research.

Also, re-reading Wentworth's original comment, I note that o1 is weak according to FM. Maybe the things Wentworth is doing are just too hard for o1, rather than (just) overfitting-on-benchmarks style issues? In any case his frustration with o1's math skills doesn't mean that FM isn't measuring real math research capability.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2024-12-26T01:25:11.284Z · LW(p) · GW(p)

The internet and the mathematical literature is so vast that, unless you are doing something truly novel, there's some relevant subfield there

Previously, I'd intuitively assumed the same as well: that it doesn't matter if LLMs can't "genuinely research/innovate", because there is enough potential for innovative-yet-trivial combination of existing ideas that they'd still massively speed up R&D by finding those combinations. ("Innovation overhang", as @Nathan Helm-Burger [LW · GW] puts it here [LW(p) · GW(p)].)

Back in early 2023, I'd considered it fairly plausible that the world would start heating up in 1-2 years due to such synthetically-generated innovations.

Except this... just doesn't seem to be happening? I'm yet to hear of a single useful scientific paper or other meaningful innovation that was spearheaded by a LLM.^[1] And they're already adept at comprehending such innovative-yet-trivial combinations if a human prompts them with those combinations. So it's not the matter of not yet being able to understand or appreciate the importance of such synergies. (If Sonnet 3.5.1 or o1 pro didn't do it, I doubt o3 would.)

Yet this is still not happening. My guess is that "innovative-yet-trivial combinations of existing ideas" are not actually "trivial", and LLMs can't do that for the same reasons they can't do "genuine research" (whatever those reasons are).

^{^}
Admittedly it's possible that this is totally happening all over the place and people are just covering it up in order to have all of the glory/status for themselves. But I doubt it: there are enough remarkably selfless LLM enthusiasts that if this were happening, I'd expect it would've gone viral already.

Replies from: sharmake-farah, nathan-helm-burger

↑ comment by Noosphere89 (sharmake-farah) · 2024-12-26T02:59:18.187Z · LW(p) · GW(p)

There are 2 things to keep in mind:

It's only now that LLMs are reasonably competent in at least some hard problems, and at any rate, I expect RL to basically solve the domain, because of verifiability properties combined with quite a bit of training data.
We should wait a few years, as we have another scale-up that's coming up, and it will probably be quite a jump from current AI due to more compute:

https://www.lesswrong.com/posts/NXTkEiaLA4JdS5vSZ/?commentId=7KSdmzK3hgcxkzmPX [LW · GW]

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2024-12-26T16:37:18.117Z · LW(p) · GW(p)

It's only now that LLMs are reasonably competent in at least some hard problems

I don't think that's the limiter here. Reports in the style of "my unpublished PhD thesis was about doing X using Y methodology, I asked an LLM to do that and it one-shot a year of my work! the equations it derived are correct!" have been around for quite a while. I recall it at least in relation to Claude 3, and more recently, o1-preview.

If LLMs are prompted to combine two ideas, they've been perfectly capable of "innovating" for ages now, including at fairly high levels of expertise. I'm sure there's some sort of cross-disciplinary GPQA-like benchmark that they've saturated a while ago, so this is even legible.

The trick is picking which ideas to combine/in what direction to dig. This doesn't appear to be something LLMs are capable of doing well on their own, nor do they seem to speed up human performance on this task. (All cases of them succeeding at it so far have been, by definition, "searching under the streetlight": checking whether they can appreciate a new idea that a human already found on their own and evaluated as useful.)

I suppose it's possible that o3 or its successors change that (the previous benchmarks weren't measuring that, but surely FrontierMath does...). We'll see.

I expect RL to basically solve the domain

Mm, I think it's still up in the air whether even the o-series efficiently scales (as in, without requiring a Dyson Swarm's worth of compute) to beating the Millennium Prize Eval (or some less legendary yet still major problems).

I expect such problems don't pass the "can this problem be solved by plugging the extant crystallized-intelligence skills of a number of people into each other in a non-contrived^[1] way?" test. Does RL training allow to sidestep this, letting the model generate new crystallized-intelligence skills?

I'm not confident one way or another.

we have another scale-up that's coming up

I'm bearish on that. I expect GPT-4 to GPT-5 to be palatably less of a jump than GPT-3 to GPT-4, same way GPT-3 to GPT-4 was less of a jump than GPT-2 to GPT-3. I'm sure it'd show lower loss, and saturate some more benchmarks, and perhaps an o-series model based on it clears FrontierMath, and perhaps programmers and mathematicians would be able to use it in an ever-so-bigger number of cases...

But I predict, with low-moderate confidence, that it still won't kick off a deluge of synthetically derived innovations. It'd have even more breadth and eye for nuance, but somehow, perplexingly, still no ability to use those capabilities autonomously.

^{^}
"Non-contrived" because technically, any cognitive skill is just a combination of e. g. NAND gates, since those are Turing-complete. But obviously that doesn't mean any such skill is accessible if you've learned the NAND gate. Intuitively, a combination of crystallized-intelligence skills is only accessible if the idea of combining them is itself a crystallized-intelligence skill (e. g., in the math case, a known ansatz).
Which perhaps sheds some light on why LLMs can't innovate even via trivial ideas combinations. If a given idea-combination "template" weren't present in the training data, the LLM can't reliably independently conceive of it except by brute-force enumeration...? This doesn't seem quite right, but maybe in the right direction.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2024-12-26T16:55:17.188Z · LW(p) · GW(p)

I'm not confident one way or another.

I think my key crux is that in domains where there is a way to verify that the solution actually works, RL can scale to superhuman performance, and mathematics/programming are domains that are unusually easy to verify/gather training data for RL performance, so with caveats it can become rather good at those specific domains/benchmarks like millennium prize evals, but the important caveat is I don't believe this transfers very well to domains where verifying isn't easy, like creative writing.

I'm bearish on that. I expect GPT-4 to GPT-5 to be palatably less of a jump than GPT-3 to GPT-4, same way GPT-3 to GPT-4 was less of a jump than GPT-2 to GPT-3. I'm sure it'd show lower loss, and saturate some more benchmarks, and perhaps an o-series model based on it clears FrontierMath, and perhaps programmers and mathematicians would be able to use it in an ever-so-bigger number of cases...

I was talking about the 1 GW systems that would be developed in late 2026-early 2027, not GPT-5.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2024-12-26T18:43:21.643Z · LW(p) · GW(p)

in domains where there is a way to verify that the solution actually works, RL can scale to superhuman performance

Sure, the theory on that is solid. But how efficiently does it scale off-distribution, in practice?

The inference-time scaling laws, much like the pretraining scaling laws, are ultimately based on test sets whose entries are "shallow" (in the previously discussed sense). It doesn't tell us much regarding how well the technique scales with the "conceptual depth" of a problem.

o3 took a million dollars in inference-time compute and unknown amounts in training-time compute just to solve the "easy" part of the FrontierMath benchmark (which likely take human experts single-digit hours, maybe <1 hour for particularly skilled humans). How much would be needed for beating the "hard" subset of FrontierMath? How much more still would be needed for problems that take individual researchers days; or problems that take entire math departments months; or problems that take entire fields decades?

It's possible that the "synthetic data flywheel" works so well that the amount of human-researcher-hour-equivalents per unit of compute scales, say, exponentially with some aspect of o-series' training, and so o6 in 2027 solves the Riemann Hypothesis.

Or it scales not that well, and o6 can barely clear real-life equivalents of hard FrontierMath problems. Perhaps instead the training costs (generating all the CoT trees on which RL training is then done) scale exponentially, while researcher-hour-equivalents per compute units scale linearly.

It doesn't seem to me that we know which one it is yet. Do we?

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2024-12-26T19:50:37.206Z · LW(p) · GW(p)

I don't think we know yet whether it will succeed in practice, or whether it training costs make it infeasibble to do.

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-12-26T18:28:13.233Z · LW(p) · GW(p)

Consider: https://www.cognitiverevolution.ai/can-ais-generate-novel-research-ideas-with-lead-author-chenglei-si/

I think a different phenomenon is occuring. My guess, updating on my own experience, is that ideas aren't the current bottleneck. 1% inspiration, 99% perspiration.

As someone who has been reading 3-20 papers per month for many years now, in neuroscience and machine learning, I feel overwhelmed with ideas. I average about 0.75 per paper. I write them down, and the lists grow faster than they shrink by two orders of magnitude.

When I was on my favorite industry team, what I most valued about my technical manager was his ability to help me sort through and prioritize them. It was like I created a bunch of LEGO pieces, he picked one to be next, I put it in place by coding it up, he checked the placement by reviewing my PR. If someone has offered me a source of ideas ranging in quality between worse than my worst ideas, and almost as good as my best ideas, and skewed towards bad... I'd have laughed and turned them down without a second thought.

For something like a paper instead of a minor tech idea for 1 week PR... The situation is far more intense. The grunt work of running the experiments and preparing the paper is enormous compared to the time and effort of coming up with the idea in the first place. More like 0.1% to 99.9%.

Current LLMs can speed up creating a paper if given the results and experiment description to write about. That's probably also not the primary bottleneck (although still more than idea generation).

So the current bottleneck, in my estimation, for ml experiments, is the experiments. Coding up the experiments accurately and efficiently, running them (and handling the compute costs), analyzing the results.

So I've been expecting to see an acceleration dependent on that aspect. That's hard to measure though. Are LLMs currently speeding this work up a little? Probably. I've had my work sped up some by the recent Sonnet 3.5.1. Currently though it's a trade-off, there's overhead in checking for misinterpretations and correcting bugs. We still seem a long way in "capability space" from me being able to give a background paper and rough experiment description, and then having the model do the rest. Only once that's the case will idea generation become my bottleneck.

Replies from: johnswentworth, Thane Ruthenis

↑ comment by johnswentworth · 2024-12-26T18:36:30.528Z · LW(p) · GW(p)

That's the opposite of my experience. Nearly all the papers I read vary between "trash, I got nothing useful out besides an idea for a post explaining the relevant failure modes" and "high quality but not relevant to anything important". Setting up our experiments is historically much faster than the work of figuring out what experiments would actually be useful.

There are exceptions to this, large projects which seem useful and would require lots of experimental work, but they're usually much lower-expected-value-per-unit-time than going back to the whiteboard, understanding things better, and doing a simpler experiment once we know what to test.

Replies from: nathan-helm-burger, Thane Ruthenis

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-12-26T18:59:54.596Z · LW(p) · GW(p)

Ah, well, for most papers that spark an idea in me, the idea isn't simply an extension of the paper. It's a question tangentially related which probes at my own frontier of understanding.

I've always found that a boring lecture is a great opportunity to brainstorm because my mind squirms away from the boredom into invention and extrapolation of related ideas. A boring paper does some of the same for me, except that I'm less socially pressured to keep reading it, and thus less able to squeeze my mind with the boredom of it.

As for coming up with ideas... It is a weakness of mind that I am far better at generating ideas than at critiquing them (my own or others). Which is why I worked so well in a team where I had someone I trusted to sort through my ideas and pick out the valuable ones. It sounds to me like you have a better filter on idea quality.

↑ comment by Thane Ruthenis · 2024-12-27T00:10:08.512Z · LW(p) · GW(p)

That's mostly my experience as well: experiments are near-trivial to set up, and setting up any experiment that isn't near-trivial to set up is a poor use of the time that can instead be spent thinking on the topic a bit more and realizing what the experimental outcome would be or why this would be entirely the wrong experiment to run.

But the friction costs of setting up an experiment aren't zero. If it were possible to sort of ramble an idea at an AI and then have it competently execute the corresponding experiment (or set up a toy formal model and prove things about it), I think this would be able to speed up even deeply confused/non-paradigmatic research.

... That said, I think the sorts of experiments we do aren't the sorts of experiments ML researchers do. I expect they're often things like "do a pass over this lattice of hyperparameters and output the values that produce the best loss" (and more abstract equivalents of this that can't be as easily automated using mundane code). And which, due to the atheoretic nature of ML, can't be "solved in the abstract".

So ML research perhaps could be dramatically sped up by menial-software-labor AIs. (Though I think even now the compute needed for running all of those experiments would be the more pressing bottleneck.)

↑ comment by Thane Ruthenis · 2024-12-26T18:34:19.836Z · LW(p) · GW(p)

Convincing.

↑ comment by Noosphere89 (sharmake-farah) · 2024-12-24T16:51:07.462Z · LW(p) · GW(p)

of the amazing things they do should be considered surprising facts about how far this trick can scale; not surprising facts about how close we are to AGI.

I agree that the trick scaling as far as it has is surprising, but I'd disagree with the claim that this doesn't bear on AGI.

I do think that something like dumb scaling can mostly just work, and I think the main takeaway I take from AI progress is that there will not a be a clear resolution to when AGI happens, as the first AIs to automate AI research will have very different skill profiles from humans, and most importantly we need to disentangle capabilities in a way we usually don't for humans.

I agree with faul sname here:

we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for".

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2024-12-24T17:09:43.350Z · LW(p) · GW(p)

I do think that something like dumb scaling can mostly just work

The exact degree of "mostly" is load-bearing here. You'd mentioned [LW(p) · GW(p)] provisions for error-correction before. But are the necessary provisions something simple, such that the most blatantly obvious wrappers/prompt-engineering works, or do we need to derive some additional nontrivial theoretical insights to correctly implement them?

Last I checked, AutoGPT-like stuff has mostly failed, so I'm inclined to think it's closer to the latter.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2024-12-24T17:16:44.526Z · LW(p) · GW(p)

Actually, I've changed my mind, in that the reliability issue probably does need at least non-trivial theoretical insights to make AIs work.

Replies from: faul_sname, sharmake-farah

↑ comment by faul_sname · 2024-12-24T23:58:53.393Z · LW(p) · GW(p)

I am unconvinced that "the" reliability issue is a single issue that will be solved by a single insight, rather than AIs lacking procedural knowledge of how to handle a bunch of finicky special cases that will be solved by online learning or very long context windows once hardware costs decrease enough to make one of those approaches financially viable.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2024-12-25T00:52:48.940Z · LW(p) · GW(p)

Yeah, I'm sympathetic to this argument that there won't be a single insight, and that at least one approach will work out once hardware costs decrease enough, and I agree less with Thane Ruthenis's intuitions here than I did before.

↑ comment by Noosphere89 (sharmake-farah) · 2024-12-24T17:59:02.588Z · LW(p) · GW(p)

If I were to think about it a little, I'd suspect the big difference that LLMs and humans have is state/memory, where humans do have state/memory, but LLMs are currently more or less stateless today, and RNN training has not been solved to the extent transformers were.

One thing I will also say is that AI winters will be shorter than previous AI winters, because AI products can now be sort of made profitable, and this gives an independent base of money for AI research in ways that weren't possible pre-2016.

Replies from: mateusz-baginski

↑ comment by Mateusz Bagiński (mateusz-baginski) · 2024-12-24T22:08:06.170Z · LW(p) · GW(p)

A factor stemming from the same cause but pushing in the opposite direction is that "mundane" AI profitability can "distract" people who would otherwise be AGI hawks.

↑ comment by notfnofn · 2024-12-24T16:55:25.686Z · LW(p) · GW(p)

↑ comment by waterlubber · 2024-12-24T17:55:44.555Z · LW(p) · GW(p)

I agree with you on your assessment of GPQA. The questions themselves appear to be low quality as well. Take this one example, although it's not from GPQA Diamond:

In UV/Vis spectroscopy, a chromophore which absorbs red colour light, emits _____ colour light.

The correct answer is stated as yellow and blue. However, the question should read transmits, not emits; molecules cannot trivially absorb and re-emit light of a shorter wavelength without resorting to trickery (nonlinear effects, two-photon absorption).

This is, of course, a cherry-picked example, but is exactly characteristic of the sort of low-quality science questions I saw in school (e.g with a teacher or professor who didn't understand the material very well). Scrolling through the rest of the GPQA questions, they did not seem like questions that would require deep reflection or thinking, but rather the sort of trivia things that I would expect LLMs to perform extremely well on.

I'd also expect "popular" benchmarks to be easier/worse/optimized for looking good while actually being relatively easy. OAI et. al probably have the mother of all publication biases with respect to benchmarks, and are selecting very heavily for items within this collection.

↑ comment by mtaran · 2025-01-01T16:06:35.097Z · LW(p) · GW(p)

Re: LLMs for coding: One lens on this is that LLM progress changes the Build vs Buy calculus.

Low-power AI coding assistants were useful in both the "build" and "buy" scenarios, but they weren't impactful enough to change the actual border between build-is-better vs. buy-is-better. More powerful AI coding systems/agents can make a lot of tasks sufficiently easy that dealing with some components starts feeling more like buying than building. Different problem domains have different peak levels of complexity/novelty, so the easier domains will start being affected more and earlier by this build/buy decision boundary shift. Many people don't travel far from their primary domains, so to some of them it will look like the shift is happening quickly (because it is, in their vicinity) even though on the larger scale it's still pretty gradual.

↑ comment by Kabir Kumar (kabir-kumar) · 2024-12-27T23:40:06.335Z · LW(p) · GW(p)

Personally, I think o1 is uniquely trash, I think o1-preview was actually better. Getting on average, better things from deepseek and sonnet 3.5 atm.

comment by johnswentworth · 2025-03-10T04:29:15.465Z · LW(p) · GW(p)

Hypothesis: for smart people with a strong technical background, the main cognitive barrier to doing highly counterfactual technical work is that our brains' attention is mostly steered by our social circle. Our thoughts are constantly drawn to think about whatever the people around us talk about. And the things which are memetically fit are (almost by definition) rarely very counterfactual to pay attention to, precisely because lots of other people are also paying attention to them.

Two natural solutions to this problem:

build a social circle which can maintain its own attention, as a group, without just reflecting the memetic currents of the world around it.
"go off into the woods", i.e. socially isolate oneself almost entirely for an extended period of time, so that there just isn't any social signal to be distracted by.

These are both standard things which people point to as things-historically-correlated-with-highly-counterfactual-work. They're not mutually exclusive, but this model does suggest that they can substitute for each other - i.e. "going off into the woods" can substitute for a social circle with its own useful memetic environment, and vice versa.

Replies from: aysja, mateusz-baginski, rauno-arike, leogao, Viliam, Amyr, ozziegooen

↑ comment by aysja · 2025-03-10T23:53:28.361Z · LW(p) · GW(p)

One thing that I do after social interactions, especially those which pertain to my work, is to go over all the updates my background processing is likely to make and to question them more explicitly.

This is helpful because I often notice that the updates I’m making aren’t related to reasons much at all. It’s more like “ah they kind of grimaced when I said that, so maybe I'm bad?” or like “they seemed just generally down on this approach, but wait are any of those reasons even new to me? Haven’t I already considered those and decided to do it anyway?” or “they seemed so aggressively pessimistic about my work, but did they even understand what I was saying?” or “they certainly spoke with a lot of authority, but why should I trust them on this, and do I even care about their opinion here?” Etc. A bunch of stuff which at first blush my social center is like “ah god, it’s all over, I’ve been an idiot this whole time” but with some second glancing it’s like “ah wait no, probably I had reasons for doing this work that withstand surface level pushback, let’s remember those again and see if they hold up” And often (always?) they do.

This did not come naturally to me; I’ve had to train myself into doing it. But it has helped a lot with this sort of problem, alongside the solutions you mention i.e. becoming more of a hermit and trying to surround myself by people engaged in more timeless thought.

↑ comment by Mateusz Bagiński (mateusz-baginski) · 2025-03-10T06:29:49.493Z · LW(p) · GW(p)

solution 2 implies that a smart person with a strong technical background would go on to work on important problems (by default) which is not necessarily universally true and it's IMO likely that many such people would be working on less important things than what their social circle is otherwise steering them to work on

Replies from: johnswentworth, faul_sname, D0TheMath, lahwran

↑ comment by johnswentworth · 2025-03-10T17:31:01.208Z · LW(p) · GW(p)

The claim is not that either "solution" is sufficient for counterfactuality, it's that either solution can overcome the main bottleneck to counterfactuality. After that, per Amdahl's Law, there will still be other (weaker) bottlenecks to overcome, including e.g. keeping oneself focused on something important.

Replies from: Raemon

↑ comment by Raemon · 2025-03-10T19:55:08.021Z · LW(p) · GW(p)

main bottleneck to counterfactuality

I don't think the social thing ranks above "be able to think useful important thoughts at all". (But maybe otherwise agree with the rest of your model as an important thing to think about)

[edit: hrm, "for smart people with a strong technical background" might be doing most of the work here"]

↑ comment by faul_sname · 2025-03-10T16:20:13.027Z · LW(p) · GW(p)

Plausibly going off into the woods decreases the median output while increasing the variance.

↑ comment by Garrett Baker (D0TheMath) · 2025-03-10T16:22:32.532Z · LW(p) · GW(p)

it's IMO likely that many such people would be working on less important things than what their social circle is otherwise steering them to work on

Why do you think this? When I try to think of concrete examples here, its all confounded by the relevant smart people having social circles not working on useful problems.

I also think that 2 becomes more true once the relevant smart person already wants to solve alignment, or otherwise is already barking up the right tree.

↑ comment by the gears to ascension (lahwran) · 2025-03-10T08:35:45.699Z · LW(p) · GW(p)

One need not go off into the woods indefinitely, though.

Replies from: mateusz-baginski

↑ comment by Mateusz Bagiński (mateusz-baginski) · 2025-03-10T10:43:40.462Z · LW(p) · GW(p)

I don't think I implied that John's post implied that and I don't think going into the woods non-indefinitely mitigates the thing I pointed out.

↑ comment by Rauno Arike (rauno-arike) · 2025-03-10T21:46:16.039Z · LW(p) · GW(p)

As a counterpoint to the "go off into the woods" strategy, Richard Hamming said the following in "You and Your Research", describing his experience at Bell Labs:

Thus what you consider to be good working conditions may not be good for you! There are many illustrations of this point. For example, working with one’s door closed lets you get more work done per year than if you had an open door, but I have observed repeatedly that later those with the closed doors, while working just as hard as others, seem to work on slightly the wrong problems, while those who have let their door stay open get less work done but tend to work on the right problems! I cannot prove the cause-and-effect relationship; I can only observed the correlation. I suspect the open mind leads to the open door, and the open door tends to lead to the open mind; they reinforce each other.

Bell Labs certainly produced a lot of counterfactual research, Shannon's information theory being the prime example. I suppose Bell Labs might have been well-described as a group that could maintain its own attention, though.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-10T22:02:31.505Z · LW(p) · GW(p)

Bell Labs is actually my go-to example of a much-hyped research institution whose work was mostly not counterfactual; see e.g. here [LW(p) · GW(p)]. Shannon's information theory is the only major example I know of highly counterfactual research at Bell Labs. Most of the other commonly-cited advances, like e.g. transistors or communication satellites or cell phones, were clearly not highly counterfactual when we look at the relevant history: there were other groups racing to make the transistor, and the communication satellite and cell phones were both old ideas waiting on the underlying technology to make them practical.

That said, Hamming did sit right next to Shannon during the information theory days IIRC, so his words do carry substantial weight here.

↑ comment by leogao · 2025-03-10T15:40:05.513Z · LW(p) · GW(p)

solution 3 is to be an iconoclast and to feel comfortable pushing against the flow and to try to prove everyone else wrong.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-10T17:37:54.704Z · LW(p) · GW(p)

Good idea, but... I would guess that basically everyone who knew me growing up would say that I'm exactly the right sort of person for that strategy. And yet, in practice, I still find it has not worked very well. My attention has in fact been unhelpfully steered by local memetic currents to a very large degree.

For instance, I do love proving everyone else wrong, but alas reversed stupidity is not intelligence [LW · GW]. People mostly don't argue against the high-counterfactuality important things, they ignore the high-counterfactuality important things. Trying to prove them wrong about the things they do argue about is just another way of having one's attention steered by the prevailing memetic currents.

Replies from: TsviBT

↑ comment by TsviBT · 2025-03-10T18:28:09.394Z · LW(p) · GW(p)

People mostly don't argue against the high-counterfactuality important things, they ignore the high-counterfactuality important things. Trying to prove them wrong about the things they do argue about is just another way of having one's attention steered by the prevailing memetic currents.

This is true, but I still can't let go of the fact that this fact itself ought to be a blindingly obvious first-order bit that anyone who calls zerself anything like "aspiring rationalist" would be paying a good chunk of attention to, and yet this does not seem to be the case. Like, motions in the genre of

huh I just had reaction XYZ to idea ABC generated by a naively-good search process, and it seems like this is probably a common reaction to ABC; but if people tend to react to ABC with XYZ, and with other things coming from the generators of XYZ, then such and such distortion in beliefs/plans would be strongly pushed into the collective consciousness, e.g. on first-order or on higher-order deference effects [LW · GW] ; so I should look out for that, e.g. by doing some manual fermi estimates or other direct checking about ABC or by investigating the strength of the steelman of reaction XYZ, or by keeping an eye out for people systematically reacting with XYZ without good foundation so I can notice this,

where XYZ could centrally be things like e.g. copium or subtly contemptuous indifference, do not seem to be at all common motions.

Replies from: Morpheus

↑ comment by Morpheus · 2025-03-13T23:55:08.184Z · LW(p) · GW(p)

So I should look out for that, e.g. by doing some manual fermi estimates or other direct checking about ABC or by investigating the strength of the steelman of reaction XYZ, or by keeping an eye out for people systematically reacting with XYZ without good foundation so I can notice this,

Accusing people in my head of not being numerate enough [LW · GW] when this happens has helped, because then I don't want to be a hypocrite. GPT4o or o1 are good at fermi estimates, making this even easier.

↑ comment by Viliam · 2025-03-10T08:50:19.409Z · LW(p) · GW(p)

build a social circle which can maintain its own attention, as a group, without just reflecting the memetic currents of the world around it.

Note that it is not necessary for the social circle to share your beliefs, only to have a social norm that people express interest in each other's work. Could be something like: once or twice in a week the people will come to a room and everyone will give a presentation about what they have achieved recently, and maybe the other people will provide some feedback (not in the form of "why don't you do Y instead", but with the assumption that X is a thing worth doing).

↑ comment by Cole Wyeth (Amyr) · 2025-03-10T19:23:43.252Z · LW(p) · GW(p)

How would this model treat mathematicians working on hard open problems? P vs NP might be counter factual just because no one else is smart enough or has the right advantage to solve it. Insofar as central problems of a field have been identified but not solved, I’m not sure your model gives good advice.

Replies from: dmurfet

↑ comment by Daniel Murfet (dmurfet) · 2025-03-11T12:11:13.907Z · LW(p) · GW(p)

I visited Mikhail Khovanov once in New York to give a seminar talk, and after it was all over and I was wandering around seeing the sights, he gave me a call and offered a long string of general advice on how to be the kind of person who does truly novel things (he's famous for this, you can read about Khovanov homology). One thing he said was "look for things that aren't there" haha. It's actually very practical advice, which I think about often and attempt to live up to!

Replies from: adele-lopez-1

↑ comment by Adele Lopez (adele-lopez-1) · 2025-03-11T17:46:03.900Z · LW(p) · GW(p)

What else did he say? (I'd love to hear even the "obvious" things he said.)

Replies from: dmurfet

↑ comment by Daniel Murfet (dmurfet) · 2025-03-11T20:35:48.493Z · LW(p) · GW(p)

I'm ashamed to say I don't remember. That was the highlight. I think I have some notes on the conversation somewhere and I'll try to remember to post here if I ever find it.

I can spell out the content of his Koan a little, if it wasn't clear. It's probably more like: look for things that are (not there). If you spend enough time in a particular landscape of ideas, you can (if you're quiet and pay attention and aren't busy jumping on bandwagons) get an idea of a hole, which you're able to walk around but can't directly see. In this way new ideas appear as something like residues from circumnavigating these holes. It's my understanding that Khovanov homology was discovered like that, and this is not unusual in mathematics.

By the way, that's partly why I think the prospect of AIs being creative mathematicians in the short term should not be discounted; if you see all the things you see all the holes.

Replies from: alexander-gietelink-oldenziel, danielechlin, TsviBT

↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-03-12T17:58:48.772Z · LW(p) · GW(p)

For those who might not have noticed Dan's clever double entendre: (Khovanov) homology is literally about counting/measuring holes in weird high-dimensional spaces - designing a new homology theory is in a very real sense about looking for holes that are not (yet) there.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2025-03-15T11:07:17.390Z · LW(p) · GW(p)

Are there any examples yet, of homology or cohomology being applied to cognition, whether human or AI?

Replies from: dmurfet, alexander-gietelink-oldenziel, D0TheMath

↑ comment by Daniel Murfet (dmurfet) · 2025-03-16T05:57:33.987Z · LW(p) · GW(p)

There's plenty, including a line of work by Carina Curto, Katrin Hess and others that is taken seriously by a number of mathematically inclined neuroscience people (Tom Burns if he's reading can comment further). As far as I know this kind of work is the closest to breaking through into the mainstream. At some level you can think of homology as a natural way of preserving information in noisy systems, for reasons similar to why (co)homology of tori was a useful way for Kitaev to formulate his surface code. Whether or not real brains/NNs have some emergent computation that makes use of this is a separate question, I'm not aware of really compelling evidence.

There is more speculative but definitely interesting work by Matilde Marcolli. I believe Manin has thought about this (because he's thought about everything) and if you have twenty years to acquire the prerequisites (gamma spaces!) you can gaze into deep pools by reading that too.

↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-03-15T20:09:18.353Z · LW(p) · GW(p)

No.

↑ comment by Garrett Baker (D0TheMath) · 2025-03-16T03:19:08.501Z · LW(p) · GW(p)

Topological data analysis comes closest, and there are some people who try to use it for ML, eg [LW · GW].

Though my understanding is this is used in interp, not so much because people necessarily expect deep connections to homology, but because its just another way to look for structure in your data.

TDA itself is also a relatively shallow tool too.

Replies from: Lorxus

↑ comment by Lorxus · 2025-03-16T17:06:59.751Z · LW(p) · GW(p)

As someone who does both data analysis and algebraic topology, my take is that TDA showed promise but ultimately there's something missing such that it's not at full capacity. Either the formalism isn't developed enough or it's being consistently used on the wrong kinds of datasets. Which is kind of a shame, because it's the kind of thing that should work beautifully and in some cases even does!

↑ comment by danielechlin · 2025-03-13T17:18:24.610Z · LW(p) · GW(p)

I thought it might be "look for things that might not even be there as hard as you would if they are there." Then the koan form takes it closer to "the thereness of something just has little relevance on how hard you look for it." But it needs to get closer to the "biological" part of your brain, where you're not faking it with all your mental and bodily systems, like when your blood pressure rises from "truly believing" a lion is around the corner but wouldn't if you "fake believe" it.

Replies from: Lorxus

↑ comment by Lorxus · 2025-03-15T23:25:11.176Z · LW(p) · GW(p)

I imagine it's something like "look for things that are notably absent, when you would expect them to have been found if there"?

↑ comment by TsviBT · 2025-03-12T08:51:38.654Z · LW(p) · GW(p)

Some things even withdraw. https://tsvibt.blogspot.com/2023/05/the-possible-shared-craft-of-deliberate.html#aside-on-withdrawal-and-the-leap https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html#withdrawal

↑ comment by ozziegooen · 2025-03-13T18:13:30.909Z · LW(p) · GW(p)

Obvious point - I think a lot of this comes from the financial incentives. The more "out of the box" you go, the less sure you can be that there will be funding for your work.

Some of those that do this will be rewarded, but I suspect many won't be.

As such, I think that funders can help more to encourage this sort of thing, if they want to.

comment by johnswentworth · 2024-10-31T17:36:29.582Z · LW(p) · GW(p)

Conjecture's Compendium is now up. It's intended to be a relatively-complete intro to AI risk for nontechnical people who have ~zero background in the subject. I basically endorse the whole thing, and I think it's probably the best first source to link e.g. policymakers to right now.

I might say more about it later, but for now just want to say that I think this should be the go-to source for new nontechnical people right now.

Replies from: akash-wasil, aysja, akash-wasil, akash-wasil, nathan-helm-burger, nathan-helm-burger, anonce

↑ comment by Orpheus16 (akash-wasil) · 2024-11-01T14:01:20.395Z · LW(p) · GW(p)

I think there's something about Bay Area culture that can often get technical people to feel like the only valid way to contribute is through technical work. It's higher status and sexier and there's a default vibe that the best way to understand/improve the world is through rigorous empirical research.

I think this an incorrect (or at least incomplete) frame, and I think on-the-margin it would be good for more technical people to spend 1-5 days seriously thinking about what alternative paths they could pursue in comms/policy.

I also think there are memes spreading around that you need to be some savant political mastermind genius to do comms/policy, otherwise you will be net negative. The more I meet policy people (including successful policy people from outside the AIS bubble), the more I think this narrative was, at best, an incorrect model of the world. At worst, a take that got amplified in order to prevent people from interfering with the AGI race (e.g., by granting excess status+validity to people/ideas/frames that made it seem crazy/unilateralist/low-status to engage in public outreach, civic discourse, and policymaker engagement.)

(Caveat: I don't think the adversarial frame explains everything, and I do think there are lots of people who were genuinely trying to reason about a complex world and just ended up underestimating how much policy interest there would be and/or overestimating the extent to which labs would be able to take useful actions despite the pressures of race dynamics.)

↑ comment by aysja · 2024-11-05T05:18:09.837Z · LW(p) · GW(p)

I think I probably agree, although I feel somewhat wary about it. My main hesitations are:

The lack of epistemic modifiers seems off to me, relative to the strength of the arguments they’re making. Such that while I agree with many claims, my imagined reader who is coming into this with zero context is like “why should I believe this?” E.g., “Without intervention, humanity will be summarily outcompeted and relegated to irrelevancy,” which like, yes, but also—on what grounds should I necessarily conclude this? They gave some argument along the lines of “intelligence is powerful,” and that seems probably true, but imo not enough to justify the claim that it will certainly lead to our irrelevancy. All of this would be fixed (according to me) if it were framed more as like “here are some reasons you might be pretty worried,” of which there are plenty, or "here's what I think," rather than “here is what will definitely happen if we continue on this path,” which feels less certain/obvious to me.
Along the same lines, I think it’s pretty hard to tell whether this piece is in good faith or not. E.g., in the intro Connor writes “The default path we are on now is one of ruthless, sociopathic corporations racing toward building the most intelligent, powerful AIs as fast as possible to compete with one another and vie for monopolization and control of both the market and geopolitics.” Which, again, I don’t necessarily disagree with, but my imagined reader with zero context is like “what, really? sociopaths? control over geopolitics?” I.e., I’m expecting readers to question the integrity of the piece, and to be more unsure of how to update on it (e.g. "how do I know this whole thing isn't just a strawman?" etc.).
There are many places where they kind of just state things without justifying them much. I think in the best case this might cause readers to think through whether such claims make sense (either on their own, or by reading the hyperlinked stuff—both of which put quite a lot of cognitive load on them), and in the worst case just causes readers to either bounce or kind of blindly swallow what they’re saying. E.g., “Black-Box Evaluations can only catch all relevant safety issues insofar as we have either an exhaustive list of all possible failure modes, or a mechanistic model of how concrete capabilities lead to safety risks.” They say this without argument and then move on. And although I agree with them (having spent a lot of time thinking this through myself), it’s really not obvious at first blush. Why do you need an exhaustive list? One might imagine, for instance, that a small number of tests would generalize well. And do you need mechanistic models? Sometimes medicines work safely without that, etc., etc. I haven’t read the entire Compendium closely, but my sense is that this is not an isolated incident. And I don't think this is a fatal flaw or anything—they're moving through a ton of material really fast and it's hard to give a thorough account of all claims—but it does make me more hesitant to use it as the default "here's what's happening" document.

All of that said, I do broadly agree with the set of arguments, and I think it’s a really cool activity for people to write up what they believe. I’m glad they did it. But I’m not sure how comfortable I feel about sending it to people who haven’t thought much about AI.

↑ comment by Orpheus16 (akash-wasil) · 2024-11-01T14:01:09.926Z · LW(p) · GW(p)

One of the common arguments in favor of investing more resources into current governance approaches (e.g., evals, if-then plans, RSPs) is that there's nothing else we can do. There's not a better alternative– these are the only things that labs and governments are currently willing to support.

The Compendium argues that there are other (valuable) things that people can do, with most of these actions focusing on communicating about AGI risks. Examples:

Share a link to this Compendium online or with friends, and provide your feedback on which ideas are correct and which are unconvincing. This is a living document, and your suggestions will shape our arguments.
Post your views on AGI risk to social media, explaining why you believe it to be a legitimate problem (or not).
Red-team companies’ plans to deal with AI risk, and call them out publicly if they do not have a legible plan.

One possible critique is that their suggestions are not particularly ambitious. This is likely because they're writing for a broader audience (people who haven't been deeply engaged in AI safety).

For people who have been deeply engaged in AI safety, I think the natural steelman here is "focus on helping the public/government better understand the AI risk situation."

There are at least some impactful and high-status examples of this (e.g., Hinton, Bengio, Hendrycks). I think in the last few years, for instance, most people would agree that Hinton/Bengio/Hendrycks have had far more impact in their communications/outreach/policy work than their technical research work.

And it's not just the famous people– I can think of ~10 junior or mid-career people who left technical research in the last year to help policymakers better understand AI progress and AI risk, and I think their work is likely far more impactful than if they had stayed in technical research. (And I'm even excluding people who are working on evals/if-then plans: like, I'm focusing on people who see their primary purpose as helping the public or policymakers develop "situational awareness", develop stronger models of AI progress and AI risk, understand the conceptual arguments for misalignment risk, etc.)

↑ comment by Orpheus16 (akash-wasil) · 2024-11-01T13:41:55.284Z · LW(p) · GW(p)

I appreciated their section on AI governance. The "if-then"/RSP/preparedness frame has become popular, and they directly argue for why they oppose this direction. (I'm a fan of preparedness efforts– especially on the government level– but I think it's worth engaging with the counterarguments.)

Pasting some content from their piece below.

High-level thesis against current AI governance efforts:

The majority of existing AI safety efforts are reactive rather than proactive, which inherently puts humanity in the position of managing risk rather than controlling AI development and preventing it.

Critique of reactive frameworks:

1. The reactive framework reverses the burden of proof from how society typically regulates high-risk technologies and industries.
In most areas of law, we do not wait for harm to occur before implementing safeguards. Banks are prohibited from facilitating money laundering from the moment of incorporation, not after their first offense. Nuclear power plants must demonstrate safety measures before operation, not after a meltdown.
The reactive framework problematically reverses the burden of proof. It assumes AI systems are safe by default and only requires action once risks are detected. One of the core dangers of AI systems is precisely that we do not know what they will do or how powerful they will be before we train them. The if-then framework opts to proceed until problems arise, rather than pausing development and deployment until we can guarantee safety. This implicitly endorses the current race to AGI.
This reversal is exactly what makes the reactive framework preferable for AI companies.

Critique of waiting for warning shots:

3. The reactive framework incorrectly assumes that an AI “warning shot” will motivate coordination.
Imagine an extreme situation in which an AI disaster serves as a “warning shot” for humanity. This would imply that powerful AI has been developed and that we have months (or less) to develop safety measures or pause further development. After a certain point, an actor with sufficiently advanced AI may be ungovernable, and misaligned AI may be uncontrollable.
When horrible things happen, people do not suddenly become rational. In the face of an AI disaster, we should expect chaos, adversariality, and fear to be the norm, making coordination very difficult. The useful time to facilitate coordination is before disaster strikes.
However, the reactive framework assumes that this is essentially how we will build consensus in order to regulate AI. The optimistic case is that we hit a dangerous threshold before a real AI disaster, alerting humanity to the risks. But history shows that it is exactly in such moments that these thresholds are most contested –- this shifting of the goalposts is known as the AI Effect and common enough to have its own Wikipedia page. Time and again, AI advancements have been explained away as routine processes, whereas “real AI” is redefined to be some mystical threshold we have not yet reached. Dangerous capabilities are similarly contested as they arise, such as how recent reports of OpenAI’s o1 being deceptive have been questioned [LW(p) · GW(p)].
This will become increasingly common as competitors build increasingly powerful capabilities and approach their goal of building AGI. Universally, powerful stakeholders fight for their narrow interests, and for maintaining the status quo, and they often win, even when all of society is going to lose. Big Tobacco didn’t pause cigarette-making when they learned about lung cancer; instead they spread misinformation and hired lobbyists. Big Oil didn’t pause drilling when they learned about climate change; instead they spread misinformation and hired lobbyists. Likewise, now that billions of dollars are pouring into the creation of AGI and superintelligence, we’ve already seen competitors fight tooth and nail to keep building. If problems arise in the future, of course they will fight for their narrow interests, just as industries always do. And as the AI industry gets larger, more entrenched, and more essential over time, this problem will grow rapidly worse.

Replies from: bogdan-ionut-cirstea

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-02T09:08:49.659Z · LW(p) · GW(p)

how recent reports of OpenAI’s o1 being deceptive have been questioned [LW(p) · GW(p)].

This seems to be confusing a dangerous capability eval (of being able to 'deceive' in a visible scratchpad) with an assessment of alignment, which seems like exactly what the 'questioning' was about.

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-11-01T04:30:30.264Z · LW(p) · GW(p)

I like it. I do worry that it, and The Narrow Path, are both missing how hard it will be to govern and restrict AI.

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-11-01T15:48:46.527Z · LW(p) · GW(p)

My own attempt is much less well written and comprehensive, but I think I hit on some points that theirs misses: https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy [LW · GW]

↑ comment by epistemic meristem (anonce) · 2024-11-04T02:38:36.786Z · LW(p) · GW(p)

(There was already a linkpost [LW · GW].)

comment by johnswentworth · 2024-06-21T16:50:23.675Z · LW(p) · GW(p)

NVIDIA Is A Terrible AI Bet

Short version: Nvidia's only moat is in software; AMD already makes flatly superior hardware priced far lower, and Google probably does too but doesn't publicly sell it. And if AI undergoes smooth takeoff on current trajectory, then ~all software moats will evaporate early.

Long version: Nvidia is pretty obviously in a hype-driven bubble right now. However, it is sometimes the case that (a) an asset is in a hype-driven bubble, and (b) it's still a good long-run bet at the current price, because the company will in fact be worth that much. Think Amazon during the dot-com bubble. I've heard people make that argument about Nvidia lately, on the basis that it will be ridiculously valuable if AI undergoes smooth takeoff on the current apparent trajectory.

My core claim here is that Nvidia will not actually be worth much, compared to other companies, if AI undergoes smooth takeoff on the current apparent trajectory.

Other companies already make ML hardware flatly superior to Nvidia's (in flops, memory, whatever), and priced much lower. AMD's MI300x is the most obvious direct comparison. Google's TPUs are probably another example, though they're not sold publicly so harder to know for sure.

So why is Nvidia still the market leader? No secret there: it's the CUDA libraries. Lots of (third-party) software is built on top of CUDA, and if you use non-Nvidia hardware then you can't use any of that software.

That's exactly the sort of moat which will disappear rapidly if AI automates most-or-all software engineering, and on current trajectory software engineering would be one of the earlier areas to see massive AI acceleration. In that world, it will be easy to move any application-level program to run on any lower-level stack, just by asking an LLM to port it over.

So in worlds where AI automates software engineering to a very large extent, Nvidia's moat is gone, and their competition has an already-better product at already-lower price.

Replies from: MichaelStJules, PeterMcCluskey, JamesPayor, tao-lin, ann-brown, jmh, scrafty, havdvdbd, o-o

↑ comment by MichaelStJules · 2024-06-22T13:34:53.578Z · LW(p) · GW(p)

Why do you believe AMD and Google make better hardware than Nvidia?

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-06-22T18:25:58.551Z · LW(p) · GW(p)

The easiest answer is to look at the specs. Of course specs are not super reliable, so take it all with many grains of salt. I'll go through the AMD/Nvidia comparison here, because it's a comparison I looked into a few months back.

MI300x vs H100

Techpowerup is a third-party site with specs for the MI300x and the H100, so we can do a pretty direct comparison between those two pages. (I don't know if the site independently tested the two chips, but they're at least trying to report comparable numbers.) The H200 would arguably be more of a "fair comparison" since the MI300x came out much later than the H100; we'll get to that comparison next. I'm starting with MI300x vs H100 comparison because techpowerup has specs for both of them, so we don't have to rely on either company's bullshit-heavy marketing materials as a source of information. Also, even the H100 is priced 2-4x more expensive than the MI300x (~$30-45k vs ~$10-15k), so it's not unfair to compare the two.

Key numbers (MI300x vs H100):

float32 TFLOPs: ~80 vs ~50
float16 TFLOPs: ~650 vs ~200
memory: 192 GB vs 80 GB (note that this is the main place where the H200 improves on the H100)
bandwidth: ~10 TB/s vs ~2 TB/s

... so the comparison isn't even remotely close. The H100 is priced 2-4x higher but is utterly inferior in terms of hardware.

MI300x vs H200

I don't know of a good third-party spec sheet for the H200, so we'll rely on Nvidia's page. Note that they report some numbers "with sparsity" which, to make a long story short, means those numbers are blatant marketing bullshit. Other than those numbers, I'll take their claimed specs at face value.

Key numbers (MI300x vs H200):

float32 TFLOPs: ~80 vs ~70
float16 TFLOPs: don't know, Nvidia conspicuously avoided reporting that number
memory: 192 GB vs 141 GB
bandwidth: ~10 TB/s vs ~5 TB/s

So they're closer than the MI300x vs H100, but the MI300x still wins across the board. And pricewise, the H200 is probably around $40k, so 3-4x more expensive than the MI300x.

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2024-06-22T21:09:20.713Z · LW(p) · GW(p)

Its worth noting that even if nvidia is charging 2-4x more now, the ultimate question for competitiveness will be manufactoring cost for nvidia vs amd. If nvidia has much lower manufactoring costs than amd per unit performance (but presumably higher markup), then nvidia might win out even if their product is currently worse per dollar.

Note also that price discrimination might be a big part of nvidia's approach. Scaling labs which are willing to go to great effort to drop compute cost by a factor of two are a subset of nvidia's customers where nvidia would ideally prefer to offer lower prices. I expect that nvidia will find a way to make this happen.

↑ comment by PeterMcCluskey · 2024-06-23T03:33:53.734Z · LW(p) · GW(p)

I'm holding a modest long position in NVIDIA (smaller than my position in Google), and expect to keep it for at least a few more months. I expect I only need NVIDIA margins to hold up for another 3 or 4 years for it to be a good investment now.

It will likely become a bubble before too long, but it doesn't feel like one yet.

↑ comment by James Payor (JamesPayor) · 2024-06-21T17:38:56.160Z · LW(p) · GW(p)

While the first-order analysis seems true to me, there are mitigating factors:

AMD appears to be bungling on their GPUs being reliable and fast, and probably will for another few years. (At least, this is my takeaway from following the TinyGrad saga on Twitter...) Their stock is not valued as it should be for a serious contender with good fundamentals, and I think this may stay the case for a while, if not forever if things are worse than I realize.
NVIDIA will probably have very-in-demand chips for at least another chip generation due to various inertias.
There aren't many good-looking places for the large amount of money that wants to be long AI to go right now, and this will probably inflate prices for still a while across the board, in proportion to how relevant-seeming the stock is. NVDA rates very highly on this one.

So from my viewpoint I would caution against being short NVIDIA, at least in the short term.

↑ comment by Tao Lin (tao-lin) · 2024-06-21T18:48:30.002Z · LW(p) · GW(p)

No, the mi300x is not superior to nvidias chips, largely because It costs >2x to manufacture as nvidias chips

↑ comment by Ann (ann-brown) · 2024-06-21T17:17:29.024Z · LW(p) · GW(p)

Potential counterpoints:

If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.
The disadvantages of AMD software development potentially need to be addressed at levels not accessible to an arbitrary feral automated software engineer in the wild, to make the stack sufficiently usable. (A lot of actual human software engineers would like the chance.)
NVIDIA is training their own AIs, who are pretty capable.
NVIDIA can invest their current profits. (Revenues, not stock valuations.)

Replies from: gwern, ann-brown

↑ comment by gwern · 2024-06-21T23:10:27.631Z · LW(p) · GW(p)

If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.

I don't think the advantages would necessarily compound - quite the opposite, there are diminishing returns and I expect 'catchup'. The first-mover advantage neutralizes itself because a rising tide lifts all boats, and the additional data acts as a prior: you can define the advantage of a better model, due to any scaling factor, as equivalent to n additional datapoints. (See the finetuning transfer papers on this.) When a LLM can zero-shot a problem, that is conceptually equivalent to a dumber LLM which needs 3-shots, say. And so the advantages of a better model will plateau, and can be matched by simply some more data in-context - such as additional synthetic datapoints generated by self-play or inner-monologue etc. And the better the model gets, the more 'data' it can 'transfer' to a similar language to reach a given X% of coding performance. (Think about how you could easily transfer given access to an environment: just do self-play on translating any solved Python problem into the target language. You already, by stipulation, have an 'oracle' to check outputs of the target against, which can produce counterexamples.) To a sad degree, pretty much all programming languages are the same these days: ALGOL with C sugaring to various degrees and random ad hoc addons; a LLM which can master Python can master Javascript can master Typescript... The hard part is the non-programming-language parts, the algorithms and reasoning and being able to understand & model the implicit state updates - not memorizing the standard library of some obscure language.

So at some point, even if you have a model which is god-like at Python (at which point each additional Python datapoint adds basic next to nothing), you will find it is completely acceptable at JavaScript, say, or even your brand-new language with 5 examples which you already have on hand in the documentation. You don't need 'the best possible performance', you just need some level of performance adequate to achieve your goal. If the Python is 99.99% on some benchmark, you are probably fine with 99.90% performance in your favorite language. (Presumably there is some absolute level like 99% at which point automated CUDA -> ROCm becomes possible, and it is independent of whether some other language has even higher accuracy.) All you need is some minor reason to pay that slight non-Python tax. And that's not hard to find.

If AI automates most, but not all, software engineering

Also, I suspect that the task of converting CUDA code to ROCm code might well fall into the 'most' category rather than being the holdout programming tasks. This is a category of code ripe for automation: you have, again by stipulation, correct working code which can be imitated and used as an oracle autonomously to brute force translation, which usually has very narrow specific algorithmic tasks ('multiply this matrix by that matrix to get this third matrix; every number should be identical'), random test-cases are easy to generate (just big grids of numbers), and where the non-algorithmic number also has simple end-to-end metrics ('loss go down per wallclock second') to optimize. Compared to a lot of areas, like business logic or GUIs, this seems much more amenable to tasking LLMs with. geohot may lack the followthrough to make AMD GPUs work, and plow through papercut after papercut, but there would be no such problem for a LLM.

So I agree with Wentsworth that there seems to be a bit of a tricky transition here for Nvidia: it's always not been worth the time & hassle to try to use an AMD GPU (although a few claim to have made it work out financially for them), because of the skilled labor and wallclock and residual technical risk and loss of flexibility ecosystem; but if LLM coding works out well enough and intelligence becomes 'too cheap to meter', almost all of that goes away. Even ordinary unsophisticated GPU buyers will be able to tell their LLM to 'just make it work on my new GPU, OK? I don't care about the details, just let me know when you're done'. At this point, what is the value-add for Nvidia? If they cut down their fat margins and race to the bottom for the hardware, where do they go for the profits? The money all seems to be in the integration and services - none of which Nvidia is particularly good at. (They aren't even all that good at training LLMs! The Megatron series was a disappointment, like Megatron-NLG-530b is barely a footnote, and even the latest Nemo seems to barely match Llama-3-70b which being like 4x larger and thus more expensive to run.)

And this will be true of anyone who is relying on software lockin: if the lockin is because it would take a lot of software engineer time to do a reverse-engineering rewrite and replacement, then it's in serious danger in a LLM human coding level world. In a world where you can hypothetically spin up a thousand SWEs on a cloud service, tell them, 'write me an operating system like XYZ', and they do so overnight while you sleep, durable software moats are going to require some sort of mysterious blackbox like a magic API; anything which is so modularized as to fit on your own computer is also sufficiently modularized as to easily clone & replace...

Replies from: ann-brown

↑ comment by Ann (ann-brown) · 2024-06-22T01:30:29.790Z · LW(p) · GW(p)

It's probably worth mentioning that there's now a licensing barrier to running CUDA specifically through translation layers: https://www.tomshardware.com/pc-components/gpus/nvidia-bans-using-translation-layers-for-cuda-software-to-run-on-other-chips-new-restriction-apparently-targets-zluda-and-some-chinese-gpu-makers

This isn't a pure software engineering time lockin; some of that money is going to go to legal action looking for a hint big targets have done the license-noncompliant thing.

Edit: Additionally, I don't think a world where "most but not all" software engineering is automated is one where it will be a simple matter to spin up a thousand effective SWEs of that capability; I think there's first a world where that's still relatively expensive even if most software engineering is being done by automated systems. Paying $8000 for overnight service of 1000 software engineers would be a rather fine deal, currently, but still too much for most people.

Replies from: gwern

↑ comment by gwern · 2024-06-22T13:46:51.704Z · LW(p) · GW(p)

I don't think that will be at all important. You are creating alternate reimplementations of the CUDA API, you aren't 'translating' or decompiling it. And if you are buying billions of dollars of GPUs, you can afford to fend off some Nvidia probes and definitely can pay $0.000008b periodically for an overnighter. (Indeed, Nvidia needing to resort to such Oracle-like tactics is a bear sign.)

Replies from: ann-brown

↑ comment by Ann (ann-brown) · 2024-06-22T18:31:50.121Z · LW(p) · GW(p)

While there's truth in what you say, I also think a market that's running thousands of software engineers is likely to be hungry for as many good GPUs as the current manufacturers can make. NVIDIA not being able to sustain a relative monopoly forever still doesn't put it in a bad position.

Replies from: gwern, Radford Neal

↑ comment by gwern · 2024-06-22T22:08:08.109Z · LW(p) · GW(p)

People will hunger for all the GPUs they can get, but then that means that the favored alternative GPU 'manufacturer' simply buys out the fab capacity and does so. Nvidia has no hardware moat: they do not own any chip fabs, they don't own any wafer manufacturers, etc. All they do is design and write software and all the softer human-ish bits. They are not 'the current manufacturer' - that's everyone else, like TSMC or the OEMs. Those are the guys who actually manufacture things, and they have no particular loyalty to Nvidia. If AMD goes to TSMC and asks for a billion GPU chips, TSMC will be thrilled to sell the fab capacity to AMD rather than Nvidia, no matter how angry Jensen is.

So in a scenario like mine, if everyone simply rewrites for AMD, AMD raises its prices a bit and buys out all of the chip fab capacity from TSMC/Intel/Samsung/etc - possibly even, in the most extreme case, buying capacity from Nvidia itself, as it suddenly is unable to sell anything at its high prices that it may be trying to defend, and is forced to resell its reserved chip fab capacity in the resulting liquidity crunch. (No point in spending chip fab capacity on chips you can't sell at your target price and you aren't sure what you're going to do.) And if AMD doesn't do so, then player #3 does so, and everyone rewrites again (which will be easier the second time as they will now have extensive test suites, two different implementations to check correctness against, documentation from the previous time, and AIs which have been further trained on the first wave of work).

↑ comment by Radford Neal · 2024-06-22T22:14:01.253Z · LW(p) · GW(p)

But why would the profit go to NVIDIA, rather than TSMC? The money should go to the company with the scarce factor of production.

↑ comment by Ann (ann-brown) · 2024-06-21T18:45:00.761Z · LW(p) · GW(p)

(... lol. That snuck in without any conscious intent to imply anything, yes. I haven't even personally interacted with the open Nvidia models yet.)

I do think the analysis is a decent map to nibbling at NVIDIA's pie share if you happen to be a competitor already -- AMD, Intel, or Apple currently, to my knowledge, possibly Google depending what they're building internally and if they decide to market it more. Apple's machine learning ecosystem is a bit of a parallel one, but I'd be at least mildly interested in it from a development perspective, and it is making progress.

But when it comes to the hardware, this is a sector where it's reasonably challenging to conjure a competitor out of thin air still, so competitor behavior -- with all its idiosyncrasies -- is pretty relevant.

↑ comment by jmh · 2024-06-23T02:57:19.044Z · LW(p) · GW(p)

Two questionson this.

First, if AI is a big value driver, in a general economic sense, is your view that NVIDIA is over prices against its future potential or just that relatively NVIDIA will under perform other investment alternatives you see.

Second, and perhaps an odd and speculative (perhaps nonsense) thought. I would expect that in this area one might see some network effects in play as well so wondering if that might impact the AI engineering decisions on software. Could the AI software solutions look towards maximising the value of the installed network (AIs work better on a common chip and code infrastructure) than will be true if one looks at some isolated technical stats. A bit a long the lines of why Beta was displaced by VHS dispite being a better technology. If so, then it seems possible that NVIDA could remain a leader and enjoy its current pricing powers (at least to some extent) for a fairly long period of time.

↑ comment by Josh You (scrafty) · 2024-06-30T23:25:19.291Z · LW(p) · GW(p)

AI that can rewrite CUDA is a ways off. It's possible that it won't be that far away in calendar time, but it is far away in terms of AI market growth and hype cycles. If GPT-5 does well, Nvidia will reap the gains more than AMD or Google.

↑ comment by havdvdbd · 2024-11-17T06:13:47.357Z · LW(p) · GW(p)

Transpiling assembly code written for one OS/kernel to assembly code for another OS/kernel while taking advantage the full speed of the processor, is a completely different task from transpiling say, java code into python.

Also, the hardware/software abstraction might break. A python developer can say hardware failures are not my problem. An assembly developer working at an AGI lab needs to consider hardware failures as lost wallclock time in their company’s race to AGI, and will try to write code so that hardware failures don’t cause the company to lose time.

GPT4 definitely can’t do this type of work and I’ll bet a lot of money GPT5 can’t do it either. ASI can do it but there’s bigger considerations than whether Nvidia makes money there, such as whether we’re still alive and whether markets and democracy continue to exist. Making a guess of N for which GPT-N can get this done requires evaluating how hard of a software task this actually is, and your comment contains no discussion of this.

Have you looked at tinygrad’s codebase or spoken to George Hotz about this?

↑ comment by O O (o-o) · 2024-06-22T14:09:03.005Z · LW(p) · GW(p)

Shorting nvidia might be tricky. I’d short nvidia and long TSM or an index fund to be safe at some point. Maybe now? Typically the highest market cap stock has poor performance after it claims that spot.

comment by johnswentworth · 2025-01-26T20:12:08.245Z · LW(p) · GW(p)

Here's a side project David and I have been looking into, which others might have useful input on...

Background: Thyroid & Cortisol Systems

As I understand it, thyroid hormone levels are approximately-but-accurately described as the body's knob for adjusting "overall metabolic rate" or the subjective feeling of needing to burn energy. Turn up the thyroid knob, and people feel like they need to move around, bounce their leg, talk fast, etc (at least until all the available energy sources are burned off and they crash). Turn down the thyroid knob, and people are lethargic.

That sounds like the sort of knob which should probably typically be set higher, today, than was optimal in the ancestral environment. Not cranked up to 11; hyperthyroid disorders are in fact dangerous and unpleasant. But at least set to the upper end of the healthy range, rather than the lower end.

... and that's nontrivial. You can just dump the relevant hormones (T3/T4) into your body, but there's a control system which tries to hold the level constant. Over the course of months, the thyroid gland (which normally produces T4) will atrophy, as it shrinks to try to keep T4 levels fixed. Just continuing to pump T3/T4 into your system regularly will keep you healthy - you'll basically have a hypothyroid disorder, and supplemental T3/T4 is the standard treatment. But you better be ready to manually control your thyroid hormone levels indefinitely if you start down this path. Ideally, one would intervene further up the control loop in order to adjust the thyroid hormone set-point, but that's more of a research topic than a thing humans already have lots of experience with.

So that's thyroid. We can tell a similar story about cortisol.

As I understand it, the cortisol hormone system is approximately-but-accurately described as the body's knob for adjusting/tracking stress. That sounds like the sort of knob which should probably be set lower, today, than was optimal in the ancestral environment. Not all the way down; problems would kick in. But at least set to the lower end of the healthy range.

... and that's nontrivial, because there's a control loop in place, etc. Ideally we'd intervene on the relatively-upstream parts of the control loop in order to change the set point.

We'd like to generalize this sort of reasoning, and ask: what are all the knobs of this sort which we might want to adjust relative to their ancestral environment settings?

Generalization

We're looking for signals which are widely broadcast throughout the body, and received by many endpoints. Why look for that type of thing? Because the wide usage puts pressure on the signal to "represent one consistent thing". It's not an accident that there are individual hormonal signals which are approximately-but-accurately described by the human-intuitive phrases "overall metabolic rate" or "stress". It's not an accident that those hormones' signals are not hopelessly polysemantic. If we look for widely-broadcast signals, then we have positive reason to expect that they'll be straightforwardly interpretable, and therefore the sort of thing we can look at and (sometimes) intuitively say "I want to turn that up/down".

Furthermore, since these signals are widely broadcast, they're the sort of thing which impacts lots of stuff (and is therefore impactful to intervene upon). And they're relatively easy to measure, compared to "local" signals.

The "wide broadcast" criterion helps focus our search a lot. For instance, insofar as we're looking for chemical signals throughout the whole body, we probably want species in the bloodstream; that's the main way a concentration could be "broadcast" throughout the body, rather than being a local signal. So, basically endocrine hormones.

Casting a slightly wider net, we might also be interested in:

Signals widely broadcast through the body by the nervous system.
Chemical signals widely broadcast through the brain specifically (since that's a particularly interesting/relevant organ).
Non-chemical signals widely broadcast through the brain specifically.

... and of course for all of these there will be some control system, so each has its own tricky question about how to adjust it.

Some Promising Leads, Some Dead Ends

With some coaxing, we got a pretty solid-sounding list of endocrine hormones out of the LLMs. There were some obvious ones on the list, including thyroid and cortisol systems, sex hormones, and pregnancy/menstruation signals. There were also a lot of signals for homeostasis of things we don't particularly want to adjust: salt balance, calcium, digestion, blood pressure, etc. There were several inflammation and healing signals, which we're interested in but haven't dug into yet. And then there were some cool ones: oxytocin (think mother-child bonding), endocannabinoids (think pot), satiety signals (think Ozempic). None of those really jumped out as clear places to turn a knob in a certain direction, other than obvious things like "take ozempic if you are even slightly overweight" and the two we already knew about (thyroid and cortisol).

Then there were neuromodulators. Here's the list we coaxed from the LLMs:

Dopamine: Tracks expected value/reward - how good things are compared to expectations.
Norepinephrine: Sets arousal/alertness level - how much attention and energy to devote to the current situation.
Serotonin: Regulates resource availability mindset - whether to act like resources are plentiful or scarce. Affects patience, time preference, and risk tolerance.
Acetylcholine: Controls signal-to-noise ratio in neural circuits - acts like a gain/precision parameter, determining whether to amplify precise differences (high ACh) or blur things together (low ACh).
Histamine: Manages the sleep/wake switch - promotes wakefulness and suppresses sleep when active.
Orexin: Acts as a stability parameter for brain states - increases the depth of attractor basins and raises transition barriers between states. Higher orexin = stronger attractors = harder to switch states.

Of those, serotonin immediately jumps out as a knob you'd probably want to turn to the "plentiful resources" end of the healthy spectrum, compared to the ancestral environment. That puts the widespread popularity of SSRIs in an interesting light!

Moving away from chemical signals, brain waves (alpha waves, theta oscillations, etc) are another potential category - they're oscillations at particular frequencies which (supposedly) are widely synced across large regions of the brain. I read up just a little, and so far have no idea how interesting they are as signals or targets.

Shifting gears, the biggest dead end so far has been parasympathetic tone, i.e. overall activation level of the parasympathetic nervous system. As far as I can tell, parasympathetic tone is basically Not A Thing: there are several different ways to measure it, and the different measurements have little correlation. It's probably more accurate to think of parasympathetic nervous activity as localized, without much meaningful global signal.

Anybody see obvious things we're missing?

Replies from: nathan-helm-burger, steve2152, pktechgirl, D0TheMath, maxwell-peterson, Thane Ruthenis, michael-roe, TrevorWiesinger, Jonas Hallgren, Jemist

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-27T00:03:38.953Z · LW(p) · GW(p)

Uh... Guys. Uh. Biology is complicated. It's a messy pile of spaghetti code. Not that it's entirely intractable to make Pareto improvements but, watch out for unintended consequences.

For instance: you are very wrong about cortisol. Cortisol is a "stress response hormone". It tells the body to divert resources to bracing itself to deal with stress (physical and/or mental). Experiments have shown that if you put someone through a stressful event while suppressing their cortisol, they have much worse outcomes (potentially including death). Cortisol doesn't make you stressed, it helps you survive stress. Deviation from homeostatic setpoints (including mental ones) are what make you stressed.

Replies from: anaguma

↑ comment by anaguma · 2025-01-27T18:34:05.403Z · LW(p) · GW(p)

This is interesting. Can you say more about these experiments?

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-27T21:22:03.521Z · LW(p) · GW(p)

Hmm, I'll see if I can find some old papers.... I'm just reciting memories from grad school lectures like... 12 years ago. Here's an example of the finding being replicated and explored further in a primate model: https://www.jci.org/articles/view/112443

Here's a review of cortisol inhibition and surgery findings. A mixed bag, a complicated system. https://academic.oup.com/bja/article/85/1/109/263834

https://onlinelibrary.wiley.com/doi/abs/10.1111/ejn.15721 "Evidence suggests that psychological stress has effects on decision making, but the results are inconsistent, and the influence of cortisol and other modulating factors remains unclear. "

Basically, cortisol is helpful for surviving injuries. Is it helpful for mental stress? Unclear. Long term high cortisol is harmful, but the stress in one's life resulting in that high cortisol level is harmful in more ways than just high cortisol. So are there times when it would be helpful to reduce someone's cortisol level? Absolutely. But it's complicated and should be done thoughtfully and selectively, and in combination with other things (particularly seeking out and treating the upstream causes).

You can find lots more on Google scholar.

↑ comment by Steven Byrnes (steve2152) · 2025-01-26T23:55:39.327Z · LW(p) · GW(p)

I don’t think that any of {dopamine, NE, serotonin, acetylcholine} are scalar signals that are “widely broadcast through the brain”. Well, definitely not dopamine or acetylcholine, almost definitely not serotonin, maybe NE. (I recently briefly looked into whether the locus coeruleus sends different NE signals to different places at the same time, and ended up at “maybe”, see §5.3.1 here [LW · GW] for a reference.)

I don’t know anything about histamine or orexin, but neuropeptides are a better bet in general for reasons in §2.1 here [LW · GW].

As far as I can tell, parasympathetic tone is basically Not A Thing

Yeah, I recall reading somewhere that the term “sympathetic” in “sympathetic nervous system” is related to the fact that lots of different systems are acting simultaneously. “Parasympathetic” isn’t supposed to be like that, I think.

↑ comment by Elizabeth (pktechgirl) · 2025-01-27T20:28:55.211Z · LW(p) · GW(p)

We're looking for signals which are widely broadcast throughout the body, and received by many endpoints. Why look for that type of thing? Because the wide usage puts pressure on the signal to "represent one consistent thing". It's not an accident that there are individual hormonal signals which are approximately-but-accurately described by the human-intuitive phrases "overall metabolic rate" or "stress". It's not an accident that those hormones' signals are not hopelessly polysemantic. If we look for widely-broadcast signals, then we have positive reason to expect that they'll be straightforwardly interpretable, and therefore the sort of thing we can look at and (sometimes) intuitively say "I want to turn that up/down".

This sounds logical but I don't think is backed empirically, at least to the degree you're claiming. Source: I have a biology BA and can't speak directly to the question because I never took those classes because they had reputations for being full of exceptions and memorization.

↑ comment by Garrett Baker (D0TheMath) · 2025-01-26T22:13:25.776Z · LW(p) · GW(p)

The most obvious one imo is the immune system & the signals it sends.

Others:

Circadian rhythm
Age is perhaps a candidate here, though it may be more or less a candidate depending on if you're talking about someone before or after 30
Hospice workers sometimes talk about the body "knowing how to die", maybe there's something to that

↑ comment by Maxwell Peterson (maxwell-peterson) · 2025-01-27T20:22:20.128Z · LW(p) · GW(p)

I had seen recommendations for T3/T4 on twitter to help with low energy, and even purchased some, but haven’t taken it. I hadn’t considered that the thyroid might respond by shrinking, and now think that that’s a worrying intervention! So I’m glad I read this - thank you.

↑ comment by Thane Ruthenis · 2025-01-27T20:59:39.949Z · LW(p) · GW(p)

I don't have deep expertise in the subject, but I'm inclined to concur with the people saying that the widely broadcast signals don't actually represent one consistent thing, despite your plausible argument to the contrary.

Here's a Scott Alexander post speculating why that might be the case. In short: there was an optimization pressure towards making internal biological signals very difficult to decode, because easily decodable signals were easy target for parasites evolving to exploit them. As the result, the actual signals are probably represented as "unnecessarily" complicated, timing-based combinations of various "basic" chemical, electrical, etc. signals, and they're somewhat individualized to boot. You can't decode them just by looking at any one spatially isolated chunk of the body, by design.

Basically: separate chemical substances (and other components that look "simple" locally/from the outside) are not the privileged basis for decoding internal signals. They're the anti-privileged basis, if anything.

Replies from: steve2152

↑ comment by Steven Byrnes (steve2152) · 2025-01-30T14:15:43.144Z · LW(p) · GW(p)

Yeah but if something is in the general circulation (bloodstream), then it’s going everywhere in the body. I don’t think there’s any way to specifically direct it.

…Except in the time domain, to a limited extent. For example, in rats, tonic oxytocin in the bloodstream controls natriuresis, while pulsed oxytocin in the bloodstream controls lactation and birth. The kidney puts a low-pass filter on its oxytocin detection system, and the mammary glands & uterus put a high-pass filter, so to speak.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2025-01-30T15:14:20.901Z · LW(p) · GW(p)

Yeah but if something is in the general circulation (bloodstream), then it’s going everywhere in the body. I don’t think there’s any way to specifically direct it.

The point wouldn't be to direct it, but to have different mixtures of chemicals (and timings) to mean different things to different organs.

Loose analogy: Suppose that the intended body behaviors ("kidneys do X, heart does Y, brain does Z" for all combinations of X, Y, Z) are latent features, basic chemical substances and timings are components of the input vector, and there are dramatically more intended behaviors than input-vector components. Can we define the behavior-controlling function of organs (distributed across organs) such that, for any intended body behavior, there's a signal that sets the body into approximately this state?

It seems that yes [LW · GW]. The number of almost-orthogonal vectors in dimensions scales exponentially with $d$ , so we simply need to make the behavior-controlling function sensitive to these almost-orthogonal directions, rather than the chemical-basis vectors. The mappings from the input vector to the output behaviors, for each organ, would then be some complicated mixtures, not a simple "chemical A sets all organs into behavior X".

This analogy seems flawed in many ways, but I think something directionally-like-this might be happening?

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-01-30T17:14:30.071Z · LW(p) · GW(p)

Just because the number of almost-orthogonal vectors in dimensions scales exponentially with $d$ , doesn't mean one can choose all those signals independently. We can still only choose $d$ real-valued signals at a time (assuming away the sort of tricks by which one encodes two real numbers in a single real number, which seems unlikely to happen naturally in the body). So "more intended behaviors than input-vector components" just isn't an option, unless you're exploiting some kind of low-information-density in the desired behaviors (like e.g. very "sparse activation" of the desired behaviors, or discreteness of the desired behaviors to a limited extent).

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2025-01-30T18:47:52.337Z · LW(p) · GW(p)

The above toy model assumed that we're picking one signal at a time, and that each such "signal" specifies the intended behavior for all organs simultaneously...

... But you're right that the underlying assumption there was that the set of possible desired behaviors is discrete (i. e., that X in "kidneys do X" is a discrete variable, not a vector of reals). That might've indeed assumed me straight out of the space of reasonable toy models for biological signals, oops.

↑ comment by Michael Roe (michael-roe) · 2025-01-27T13:10:36.489Z · LW(p) · GW(p)

As someone who has Graves’ Disease … one of the reasons that you really don’t want to run your metabolism faster with higher T4 levels is that higher heart rate for an extended period can cause your heart to fail.

Replies from: michael-roe

↑ comment by Michael Roe (michael-roe) · 2025-01-27T14:16:35.271Z · LW(p) · GW(p)

More generally: changing the set point of any of these system might cause the failure of some critical component that depends on the old value of the set point,

↑ comment by trevor (TrevorWiesinger) · 2025-01-26T21:08:56.919Z · LW(p) · GW(p)

Gwern gave a list in his Nootropics megapost.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-01-26T21:13:27.701Z · LW(p) · GW(p)

Yup, I'm familiar with that one. The big difference is that I'm backward-chaining, whereas that post forward chains; the hope of backward chaining would be to identify big things which aren't on peoples' radar as nootropics (yet).

(Relatedly: if one is following this sort of path, step 1 should be a broad nutrition panel and supplementing anything in short supply, before we get to anything fancier.)

↑ comment by Jonas Hallgren · 2025-01-26T22:28:34.593Z · LW(p) · GW(p)

So I find the question underspecified, why do you want this?

Why are you decomposing body signalling without looking at the major sub-regulstort systems? If you want to predict sleep then cortisol, melatonin, etc. is something quite good and this will tell you about stress regulation which effects both endocrine as well as cortisol systems.

If you want to look at nutritional systems then GLP-1 activation is good for average food need whilst grelin is predictive of whether you will feel hungry at specific times.

If you're looking at brain health then serotonin activation patterns can be really good to check but this is different from how the stomach uses it and it does have the majority of serotonin. But this is like way to simplified especially for the brain.

Different subsystems use the same molecules in different ways, waste not and all that so what are you looking for and why?

↑ comment by J Bostock (Jemist) · 2025-01-26T21:24:38.284Z · LW(p) · GW(p)

Is there a particular reason to not include sex hormones? Some theories suggest that testosterone tracks relative social status. We might expect that high social status -> less stress (of the cortisol type) + more metabolic activity. Since it's used by trans people we have a pretty good idea of what it does to you at high doses (makes you hungry, horny, and angry) but its unclear whether it actually promotes low cortisol-stress and metabolic activity.

comment by johnswentworth · 2024-10-27T19:52:59.226Z · LW(p) · GW(p)

AFAICT, approximately every "how to be good at conversation" guide says the same thing: conversations are basically a game where 2+ people take turns free-associating off whatever was said recently. (That's a somewhat lossy compression, but not that lossy.) And approximately every guide is like "if you get good at this free association game, then it will be fun and easy!". And that's probably true for some subset of people.

But speaking for myself personally... the problem is that the free-association game just isn't very interesting.

I can see where people would like it. Lots of people want to talk to other people more on the margin, and want to do difficult thinky things less on the margin, and the free-association game is great if that's what you want. But, like... that is not my utility function. The free association game is a fine ice-breaker, it's sometimes fun for ten minutes if I'm in the mood, but most of the time it's just really boring.

Replies from: MondSemmel, Jonas Hallgren, Thane Ruthenis, Jemist, Benito, Zvi, David Lorell, Raemon, johannes-c-mayer, lc, dennis-zoeller, TsviBT, MakoYass, wassname, tomcatfish, quailia, mr-hire

↑ comment by MondSemmel · 2024-10-27T20:18:30.400Z · LW(p) · GW(p)

Even for serious intellectual conversations, something I appreciate in this kind of advice is that it often encourages computational kindness [LW · GW]. E.g. it's much easier to answer a compact closed question like "which of these three options do you prefer" instead of an open question like "where should we go to eat for lunch". The same applies to asking someone about their research; not every intellectual conversation benefits from big open questions like the Hamming Question.

Replies from: wassname

↑ comment by wassname · 2024-11-04T08:09:30.730Z · LW(p) · GW(p)

I think this is especially important for me/us to remember. On this site we often have a complex way of thinking, and a high computational budget (because we like exercising our brains to failure) and if we speak freely to the average person, they mat be annoyed at how hard it is to parse what we are saying.

We've all probably had this experience when genuinely trying to understand someone from a very different background. Perhaps they are trying to describe their inner experience when mediating, or Japanese poetry, or are simply from a different't discipline. Or perhaps we were just very tired that day, meaning we had a low computational budget.

On the other hand, we are often a "tell" culture [? · GW], which had a lower computational load compared to ask or guess culture. As long as we don't tell too much.

↑ comment by Jonas Hallgren · 2024-10-27T20:28:44.481Z · LW(p) · GW(p)

Generally fair and I used to agree, I've been looking at it from a bit of a different viewpoint recently.

If we think of a "vibe" of a conversation as a certain shared prior that you're currently inhabiting with the other person then the free association game can rather be seen as a way of finding places where your world models overlap a lot.

My absolute favourite conversations are when I can go 5 layers deep with someone because of shared inference. I think the vibe checking for shared priors is a skill that can be developed and the basis lies in being curious af.

There's apparently a lot of different related concepts in psychology about holding emotional space and other things that I think just comes down to "find the shared prior and vibe there".

Replies from: TsviBT

↑ comment by TsviBT · 2024-10-28T05:24:19.573Z · LW(p) · GW(p)

Hm. This rings true... but also I think that selecting [vibes, in this sense] for attention also selects against [things that the other person is really committed to]. So in practice you're just giving up on finding shared commitments. I've been updating that stuff other than shared commitments is less good (healthy, useful, promising, etc.) than it seems.

Replies from: Jonas Hallgren

↑ comment by Jonas Hallgren · 2024-10-28T08:28:23.120Z · LW(p) · GW(p)

Hmm, I find that I'm not fully following here. I think "vibes" might be thing that is messing it up.

Let's look at a specific example: I'm talking to a new person at an EA-adjacent event and we're just chatting about how the last year has been. Part of the "vibing" here might be to hone in on the difficulties experienced in the last year due to a feeling of "moral responsibility", in my view vibing doesn't have to be done with only positive emotions?

I think you're bringing up a good point that commitments or struggles might be something that bring people closer than positive feelings because you're more vulnerable and open as well as broadcasting your values more. Is this what you mean with shared commitments or are you pointing at something else?

Replies from: TsviBT

↑ comment by TsviBT · 2024-10-28T11:53:17.958Z · LW(p) · GW(p)

Closeness is the operating drive, but it's not the operating telos. The drive is towards some sort of state or feeling--of relating, standing shoulder-to-shoulder looking out at the world, standing back-to-back defending against the world; of knowing each other, of seeing the same things, of making the same meaning; of integrated seeing / thinking. But the telos is tikkun olam (repairing/correcting/reforming the world)--you can't do that without a shared idea of better.

As an analogy, curiosity is a drive, which is towards confusion, revelation, analogy, memory; but the telos is truth and skill.

In your example, I would say that someone could be struggling with "moral responsibility" while also doing a bunch of research or taking a bunch of action to fix what needs to be fixed; or they could be struggling with "moral responsibility" while eating snacks and playing video games. Vibes are signals and signals are cheap and hacked.

↑ comment by Thane Ruthenis · 2024-10-28T03:19:36.472Z · LW(p) · GW(p)

There's a general-purpose trick I've found that should, in theory, be applicable in this context as well, although I haven't mastered that trick myself yet.

Essentially: when you find yourself in any given cognitive context, there's almost surely something "visible" from this context such that understanding/mastering/paying attention to that something would be valuable and interesting.

For example, suppose you're reading a boring, nonsensical continental-philosophy paper. You can:

Ignore the object-level claims and instead try to reverse-engineer what must go wrong in human cognition, in response to what stimuli, to arrive at ontologies that have so little to do with reality.
Start actively building/updating a model of the sociocultural dynamics that incentivize people to engage in this style of philosophy. What can you learn about mechanism design from that? It presumably sheds light on how to align people towards pursuing arbitrary goals, or how to prevent this happening...
Pay attention to your own cognition. How exactly are you mapping the semantic content of the paper to an abstract model of what the author means, or to the sociocultural conditions that created this paper? How do these cognitive tricks generalize? If you find a particularly clever way to infer something form the text, check: would your cognitive policy automatically deploy this trick in all context where it'd be useful, or do you need to manually build a TAP [? · GW] for that?
Study what passages make the feelings of boredom or frustration spike. What does that tell you about how your intuitions/heuristics work? Could you extract any generalizable principles out of that? For example, if a given sentence particularly annoys you, perhaps it's because it features a particularly flawed logical structure, and it'd be valuable to learn to spot subtler instances of such logical flaws "in the wild".

The experience of reading the paper's text almost certainly provides some data uniquely relevant to some valuable questions, data you legitimately can't source any other way. (In the above examples: sure you can learn more efficiently about the author's cognition or the sociocultural conditions by reading some biographies or field overviews. But (1) this wouldn't give you the meta-cognitive data about how you can improve your inference functions for mapping low-level data to high-level properties, (2) those higher-level summaries would necessarily be lossy, and give you a more impoverished picture than what you'd get from boots-on-the-ground observations.)

Similar applies to:

Listening to boring lectures. (For example, you can pay intense attention to the lecturer's body language, or any tricks or flaws in their presentation.)
Doing a physical/menial task. (Could you build, on the fly, a simple model of the physics (or logistics) governing what you're doing, and refine it using some simple experiments? Then check afterwards if you got it right. Or: If you were a prehistoric human with no idea what "physics" is, how could you naturally arrive at these ideas from doing such tasks/making such observations? What does that teach you about inventing new ideas in general?)
Doing chores. (Which parts of the process can you optimize/streamline? What physical/biological conditions make those chores necessary? Could you find a new useful takeaway from the same chore every day, and if not, why?)

Et cetera.

There's a specific mental motion I associate with using this trick, which involves pausing and "feeling out [LW · GW]" the context currently loaded in my working memory, looking at it from multiple angles, trying to see anything interesting or usefully generalizable.

In theory, this trick should easily apply to small-talk as well. There has to be something you can learn to track in your mind, as you're doing small-talk, that would be useful or interesting to you.

One important constraint here is that whatever it is, it has to be such that your outwards demeanour would be that of someone who is enjoying talking to your interlocutor. If the interesting thing you're getting out of the conversation is so meta/abstract you end up paying most of the attention to your own cognitive processes, not on what the interlocutor is saying, you'll have failed at actually doing the small-talk. (Similarly, if, when doing a menial task, you end up nerd-sniped by building a physical model of the task, you'll have failed at actually doing the task.)

You also don't want to come across as sociopathic, so making a "game" of it where you're challenging yourself to socially engineer the interlocutor into something is, uh, not a great idea.

The other usual advice for finding ways to enjoy small-talk are mostly specialized instances of the above idea that work for specific people. Steering the small-talk to gradient-descend towards finding emotional common ground, ignoring the object-level words being exchanged and build a social model of the interlocutor, doing a live study of the social construct of "small-talk" by playing around with it, etc.

You'll probably need to find an instance of the trick that works for your cognition specifically, and it's also possible the optimization problem is overconstrained in your case. Still, there might be something workable.

↑ comment by J Bostock (Jemist) · 2024-10-27T20:39:02.623Z · LW(p) · GW(p)

Some people struggle with the specific tactical task of navigating any conversational territory. I've certainly had a lot of experiences where people just drop the ball leaving me to repeatedly ask questions. So improving free-association skill is certainly useful for them.

Unfortunately, your problem is most likely that you're talking to boring people (so as to avoid doing any moral value judgements I'll make clear that I mean johnswentworth::boring people).

There are specific skills to elicit more interesting answers to questions you ask. One I've heard is "make a beeline for the edge of what this person has ever been asked before" which you can usually reach in 2-3 good questions. At that point they're forced to be spontaneous, and I find that once forced, most people have the capability to be a lot more interesting than they are when pulling cached answers.

This is easiest when you can latch onto a topic you're interested in, because then it's easy on your part to come up with meaningful questions. If you can't find any topics like this then re-read paragraph 2.

↑ comment by Ben Pace (Benito) · 2024-10-28T05:30:15.598Z · LW(p) · GW(p)

Talking to people is often useful for goals like "making friends" and "sharing new information you've learned" and "solving problems" and so on. If what conversation means (in most contexts and for most people) is 'signaling that you repeatedly have interesting things to say', it's required to learn to do that in order to achieve your other goals.

Most games aren't that intrinsically interesting, including most social games. But you gotta git gud anyway because they're useful to be able to play well.

Replies from: Will Aldred

↑ comment by _will_ (Will Aldred) · 2024-10-28T14:37:12.956Z · LW(p) · GW(p)

Hmm, the ‘making friends’ part seems the most important (since there are ways to share new information you’ve learned, or solve problems, beyond conversation), but it also seems a bit circular. Like, if the reason for making friends is to hang out and have good conversations(?), but one has little interest in having conversations, then doesn’t one have little reason to make friends in the first place, and therefore little reason to ‘git gud’ at the conversation game?

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2024-10-28T21:47:03.518Z · LW(p) · GW(p)

Er, friendship involves lots of things beyond conversation. People to support you when you're down, people to give you other perspectives on your personal life, people to do fun activities with, people to go on adventures and vacations with, people to celebrate successes in your life with, and many more.

Good conversation is a lubricant for facilitating all of those other things, for making friends and sustaining friends and staying in touch and finding out opportunities for more friendship-things.

↑ comment by Zvi · 2024-10-29T13:21:12.130Z · LW(p) · GW(p)

The skill in such a game is largely in understanding the free association space, knowing how people likely react and thinking enough steps ahead to choose moves that steer the person where you want to go, either into topics you find interesting, information you want from them, or getting them to a particular position, and so on. If you're playing without goals, of course it's boring...

↑ comment by David Lorell · 2024-10-28T08:50:41.397Z · LW(p) · GW(p)

I think that "getting good" at the "free association" game is in finding the sweet spot / negotiation between full freedom of association and directing toward your own interests, probably ideally with a skew toward what the other is interested in. If you're both "free associating" with a bias toward your own interests and an additional skew toward perceived overlap, updating on that understanding along the way, then my experience says you'll have a good chance of chatting about something that interests you both. (I.e. finding a spot of conversation which becomes much more directed than vibey free association.) Conditional on doing something like that strategy, I find it ends up being just a question of your relative+combined ability at this and the extent of overlap (or lack thereof) in interests.

So short model is: Git gud at free association (+sussing out interests) -> gradient ascend yourselves to a more substantial conversation interesting to you both.

↑ comment by Raemon · 2024-10-27T23:48:42.535Z · LW(p) · GW(p)

I have similar tastes, but, some additional gears:

I think all day, these days. Even if I'm trying to have interesting, purposeful conversations with people who also want that, it is useful to have sorts of things to talk about that let some parts of my brain relax (while using other parts of my brain I don't use as much)
on the margin, you can do an intense intellectual conversation, but still make it funnier, or with more opportunity for people to contribute.

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2024-10-31T13:50:33.928Z · LW(p) · GW(p)

It's becomes more interresting when the people constrain their output based on what they expect is true information that the other person does not yet know. It's useful to talk to an expert, who tells you a bunch of random stuff they know that you don't.

Often some of it will be useful. This only works if they understand what you have said though (which presumably is something that you are interested in). And often the problem is that people's models about what is useful are wrong. This is especially likely if you are an expert in something. Then the thing that most people will say will be worse what you would think on the topic. This is especially bad if the people can't immediately even see why what you are saying is right.

The best strategy around this I have found so far is just to switch the topic to the actually interesting/important things. Suprisingly usually people go along with it.

↑ comment by lc · 2024-10-28T05:02:34.077Z · LW(p) · GW(p)

...How is that definition different than a realtime version of what you do when participating in this forum?

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-10-28T21:24:36.928Z · LW(p) · GW(p)

Good question. Some differences off the top of my head:

On this forum, if people don't have anything interesting to say, the default is to not say anything, and that's totally fine. So the content has a much stronger bias toward being novel and substantive and not just people talking about their favorite parts of Game of Thrones or rehashing ancient discussions (though there is still a fair bit of that) or whatever.
On this forum, most discussions open with a relatively-long post or shortform laying out some ideas which at least the author is very interested in. The realtime version would be more like a memo session or a lecture followed by discussion.
The intellectual caliber of people on this forum (or at least active discussants) is considerably higher than e.g. people at Berkeley EA events, let alone normie events. Last event I went to with plausibly-higher-caliber-people overall was probably the ILLIAD conference.
In-person conversations have a tendency to slide toward the lowest denominator, as people chime in about whatever parts they (think they) understand, thereby biasing toward things more people (think they) understand. On LW, karma still pushes in that direction, but threading allows space for two people to go back-and-forth on topics the audience doesn't really grock.

Not sure to what extent those account for the difference in experience.

Replies from: lc

↑ comment by lc · 2024-10-29T01:24:23.319Z · LW(p) · GW(p)

Totally understand why this would be more interesting; I guess I would still fundamentally describe what we're doing on the internet as conversation, with the same rules as you would describe above. It's just that the conversation you can find here (or potentially on Twitter) is superstimulating compared to what you're getting elsewhere. Which is good in the sense that it's more fun, and I guess bad inasmuch as IRL conversation was fulfilling some social or networking role that online conversation wasn't.

↑ comment by Dennis Zoeller (dennis-zoeller) · 2024-10-30T20:31:59.316Z · LW(p) · GW(p)

I understand, for someone with a strong drive to solve hard problems, there's an urge for conversations to serve a function, exchange information with your interlocutor so things can get done. There's much to do and communication is already painfully inefficient at it's best.

The thing is, I don't think the free-association game is inefficient, if one is skilled at it. It's also not all that free. The reason it is something humans "developed" is because it is the most efficient way to exchange rough but extensive models of our minds with others via natural language. It acts a bit like a ray tracer, you shoot conversational rays and by how they bounce around in mental structures, the thought patterns, values and biases of the conversation partners are revealed to each other. Shapes become apparent. Sometimes rays bounce off into empty space, then you need to restart the conversation, shoot a new ray. And getting better at this game, keeping the conversation going, exploring a wider range of topics more quickly, means building a faster ray tracer, means it takes less time to know if your interlocutor thinks in a way and about topics which you find enlightening/aesthetically pleasing/concretely useful/whatever you value.

Or to use a different metaphor, starting with a depth-first search and never running a breadth-first search will lead to many false negatives. There are many minds out there that can help you in ways you won't know in advance.

So if the hard problems you are working on could profit from more minds, it pays off to get better as this. Even if it has not much intrinsic value for you, it has instrumental value.

Hope this doesn't come across as patronizing, definitely not meant that way.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-10-31T15:50:08.406Z · LW(p) · GW(p)

Part of the problem is that the very large majority of people I run into have minds which fall into a relatively low-dimensional set and can be "ray traced" with fairly little effort. It's especially bad in EA circles.

Replies from: dennis-zoeller, Lorxus

↑ comment by Dennis Zoeller (dennis-zoeller) · 2024-11-04T23:02:05.121Z · LW(p) · GW(p)

Then I misunderstood your original comment, sorry. As a different commenter wrote, the obvious solution would be to only engage with interesting people. But, of course, unworkable in practice. And "social grooming" nearly always involves some level of talking. A curse of our language abilities, I guess. Other social animals don't have that particular problem.

The next best solution would be higher efficiency, more socializing bang for your word count buck, so to speak. Shorter conversations for the same social effect. Not usually a focus of anything billed as conversation guide, for obvious reasons. But there are some methods aimed at different goals that, in my experience, also help with this as a side effect.

↑ comment by Lorxus · 2025-04-16T00:21:34.823Z · LW(p) · GW(p)

Say more about "ray-tracing"? What does that look like? And do you have a bullshit-but-useful PCA-flavored breakdown of those few dimensions of variation?

↑ comment by TsviBT · 2024-10-28T05:21:09.830Z · LW(p) · GW(p)

Ok but how do you deal with the tragedy of the high dimensionality of context-space? People worth thinking with have wildly divergent goals--and even if you share goals, you won't share background information.

↑ comment by mako yass (MakoYass) · 2024-10-28T20:59:54.661Z · LW(p) · GW(p)

Yeah it sucks, search by free association is hillclimbing (gets stuck in local optima) and the contemporary media environment and political culture is an illustration of its problems.

The pattern itself is a local optimum, it's a product of people walking into a group without knowing what the group is doing and joining in anyway, and so that pattern of low-context engagement becomes what we're doing, and the anxiety that is supposed to protect us from bad patterns like this and help us to make a leap out to somewhere better is usually drowned in alcohol.

Instead of that, people should get to know each other before deciding what to talk about, and then intentionally decide to talk about what they find interesting or useful with that person. This gets better results every time.

But when we socialise as children, there isn't much about our friends to get to know, no specialists to respectfully consult, no well processed life experiences to learn from, so none of us just organically find that technique of like, asking who we're talking to, before talking, it has to be intentionally designed.

↑ comment by wassname · 2024-10-28T22:22:15.083Z · LW(p) · GW(p)

One blind spot we rationalists sometimes have is that charismatic people actually treat the game as:

"Can I think of an association that will make the other person feel good and/or further my goal?". You need people to feel good, or they won't participate. And if you want some complicated/favour/uncomftorble_truth then you better mix in some good feels to balance it out and keep the other person participating.

To put it another way: If you hurt people's brain or ego, rush them, or make them feel unsure, or contradict them, then most untrained humans will feel a little bad. Why would they want to keep feeling bad? Do you like it when people don't listen, contradict you, insult you, rush you, disagree with you? Probably not, probobly no one does.

But if someone listens to you, smiles at you, likes you, has a good opinion of you, agrees with you, make sense to you. Then it feels good!

This might sound dangerously sycophantic, and that's because it is - if people overdo it! But if it's mixed with some healthy understanding, learning, informing then It's a great conversational lubricant, and you should apply as needed. It just ensures that everyone enjoys themselves and comes back for more, counteracting the normal frictions of socialising.

There are books about this. "How to Win Friends and Influence People" recommends talking about the other person's interests (including themselves) and listening to them, which they will enjoy.

So I'd say, don't just free associate. Make sure it's fun for both parties, make room to listen to the other person, and to let them steer. (And ideally your conversational partner reciprocates, but that is not guaranteed).

↑ comment by Alex Vermillion (tomcatfish) · 2024-10-28T00:54:17.684Z · LW(p) · GW(p)

But speaking for myself personally... the problem is that the free-association game just isn't very interesting.

Hm, I think this really does change when you get better at it? This only works for people you're interested in, but if you have someone you are interested in, the free association can be a way to explore a large number of interesting topics that you can pick up in a more structured way later.

I think the statement you summarized from those guides is true, just not helpful to you.

↑ comment by quailia · 2024-10-28T00:40:23.355Z · LW(p) · GW(p)

Another view would be that people want to be good at conversation not only because they find it fun but there is utility in building rapport quickly, networking and not being cast as a cold person.

I do find the ice breaky, cached Q&A stuff really boring and tend to want to find an excuse to run away quickly, something that happens often at the dreaded "work event". I tend to see it as almost fully acting a part despite my internal feelings

At these things, I do occasionally come across the good conversationalist, able to make me want to stick with speaking to them even if the convo is not that deep or in my interest areas. I think becoming like such a person isn't a herculean task but does take practice and is something I aspire too

This is more from a professional setting though, in a casual setting it's much easier to disengage from a boring person, find shared interests and the convos have much less boundaries

↑ comment by Matt Goldenberg (mr-hire) · 2024-10-27T20:47:41.329Z · LW(p) · GW(p)

I predict you would enjoy the free-association game better if you cultivated the skill of vibing [LW(p) · GW(p)] more.

Replies from: MinusGix

↑ comment by MinusGix · 2024-10-28T15:39:26.101Z · LW(p) · GW(p)

Finally, the speed at which you communicate vibing means you're communicating almost purely from System 1, expressing your actual felt beliefs. It makes deception both of yourself and others much harder. Its much more likely to reveal your true colors. This allows it to act as a values screening mechanism as well.

I'm personally skeptical of this. I've found I'm far more likely to lie than I'd endorse when vibing. Saying "sure I'd be happy to join you on X event" when it is clear with some thought that I'd end up disliking it. Or exaggerating stories because it fits with the vibe.
I view System-1 as less concerned with truth here, it is the one that is more likely to produce a fake-argument in response to a suggested problem. More likely to play social games regardless of if they make sense.

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2024-10-28T22:28:11.530Z · LW(p) · GW(p)

Oh yes, if you're going on people's words, it's obviously not much better, but the whole point of vibing is that it's not about the words. Your aesthetics, vibes, the things you care about will be communicated non-verbally.

comment by johnswentworth · 2024-10-23T19:21:38.871Z · LW(p) · GW(p)

A Different Gambit For Genetically Engineering Smarter Humans?

Background: Significantly Enhancing Adult Intelligence With Gene Editing [LW · GW], Superbabies [LW · GW]

Epistemic Status: @GeneSmith [LW · GW] or @sarahconstantin [LW · GW] or @kman [LW · GW] or someone else who knows this stuff might just tell me where the assumptions underlying this gambit are wrong.

I've been thinking about the proposals linked above, and asked a standard question: suppose the underlying genetic studies are Not Measuring What They Think They're Measuring [LW · GW]. What might they be measuring instead, how could we distinguish those possibilities, and what other strategies does that suggest?

... and after going through that exercise I mostly think the underlying studies are fine, but they're known to not account for most of the genetic component of intelligence, and there are some very natural guesses for the biggest missing pieces, and those guesses maybe suggest different strategies.

The Baseline

Before sketching the "different gambit", let's talk about the baseline, i.e. the two proposals linked at top. In particular, we'll focus on the genetics part.

GeneSmith's plan focuses on single nucleotide polymorphisms (SNPs), i.e. places in the genome where a single base-pair sometimes differs between two humans. (This type of mutation is in contrast to things like insertions or deletions.) GeneSmith argues pretty well IMO that just engineering all the right SNPs would be sufficient to raise a human's intelligence far beyond anything which has ever existed to date.

GeneSmith cites this Steve Hsu paper, which estimates via a simple back-the-envelope calculation that there are probably on the order of 10k relevant SNPs, each present in ~10% of the population on average, each mildly deleterious.

Conceptually, the model here is that IQ variation in the current population is driven mainly by mutation load: new mutations are introduced at a steady pace, and evolution kills off the mildly-bad ones (i.e. almost all of them) only slowly, so there's an equilibrium with many random mildly-bad mutations. Variability in intelligence comes from mostly-additive contributions from those many mildly-bad mutations. Important point for later: the arguments behind that conceptual model generalize to some extent beyond SNPs; they'd also apply to other kinds of mutations.

What's Missing?

Based on a quick googling, SNPs are known to not account for the majority of genetic heritability of intelligence. This source cites a couple others which supposedly upper-bound the total SNP contribution to about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don't know the details of that method). Estimates of the genetic component of IQ tend to be 50-70%, so SNPs are about half or less.

Notably, IIRC, attempts to identify which mutations account for the rest by looking at human genetic datasets have also mostly failed to close the gap. (Though I haven't looked closely into that piece, so this is a place where I'm at particularly high risk of being wrong.)

So what's missing?

Guess: Copy Count Variation of Microsats/Minisats/Transposons

We're looking for some class of genetic mutations, which wouldn't be easy to find in current genetic datasets, have mostly-relatively-mild effects individually, are reasonably common across humans, and of which there are many in an individual genome.

Guess: sounds like variation of copy count in sequences with lots of repeats/copies, like microsatellites/minisatellites or transposons.

Most genetic sequencing for the past 20 years has been shotgun sequencing, in which we break the genome up into little pieces, sequence the little pieces, then computationally reconstruct the whole genome later. That method works particularly poorly for sequences which repeat a lot, so we have relatively poor coverage and understanding of copy counts/repeat counts for such sequences. So it's the sort of thing which might not have already been found via sequencing datasets, even though at least half the genome consists of these sorts of sequences.

Notably, these sorts of sequences typically have unusually high mutation rates. So there's lots of variation across humans. Also, there's been lots of selection pressure for the effects of those mutations to be relatively mild.

What Alternative Strategies Would This Hypothesis Suggest?

With SNPs, there's tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there's a relatively small set of different sequences. So the engineering part could be quite a lot easier, if we don't need to do different things with different copies. For instance, if the problem boils down to "get rid of live L1 transposons" or "lengthen all the XYZ repeat sequences", that would probably be simpler engineering-wise than targeting 10k SNPs.

The flip side is that there's more novel science to do. The main thing we'd want is deep sequencing data (i.e. sequencing where people were careful to get all those tricky high-copy parts right) with some kind of IQ score attached (or SAT, or anything else highly correlated with g-factor). Notably, we might not need a very giant dataset, as is needed for SNPs. Under (some versions of) the copy count model, there aren't necessarily thousands of different mutations which add up to yield the roughly-normal trait distribution we see. Instead, there's independent random copy events, which add up to a roughly-normal number of copies of something. (And the mutation mechanism makes it hard for evolution to fully suppress the copying, which is why it hasn't been selected away; transposons are a good example.)

So, main steps:

Get a moderate-sized dataset of deep sequenced human genomes with IQ scores attached.
Go look at it, see if there's something obvious like "oh hey centromere size correlates strongly with IQ!" or "oh hey transposon count correlates strongly with IQ!"
If we find anything, go engineer that thing specifically, rather than 10k SNPs.

Replies from: gwern, TsviBT, Simon Skade, rotatingpaguro

↑ comment by gwern · 2024-10-24T01:19:27.474Z · LW(p) · GW(p)

With SNPs, there's tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there's a relatively small set of different sequences.

No, rare variants are no silver bullet here. There's not a small set, there's a larger set - there would probably be combinatorially more rare variants because there are so many ways to screw up genomes beyond the limited set of ways defined by a single-nucleotide polymorphism, which is why it's hard to either select on or edit rare variants: they have larger (harmful) effects due to being rare, yes, and account for a large chunk of heritability, yes, but there are so many possible rare mutations that each one has only a few instances worldwide which makes them hard to estimate correctly via pure GWAS-style approaches. And they tend to be large or structural and so extremely difficult to edit safely compared to editing a single base-pair. (If it's hard to even sequence a CNV, how are you going to edit it?)

They definitely contribute a lot of the missing heritability (see GREML-KIN), but that doesn't mean you can feasibly do much about them. If there are tens of millions of possible rare variants, across the entire population, but they are present in only a handful of individuals a piece (as estimated by the GREML-KIN variance components where the family-level accounts for a lot of variance), it's difficult to estimate their effect to know if you want to select against or edit them in the first place. (Their larger effect sizes don't help you nearly as much as their rarity hurts you.)

So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you'd be able to avoid that loss, which is meaningful! ...in a tiny fraction of all embryos. On average, you'd just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.

Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.

If the genetic architecture had worked out otherwise, if there had instead been a lot of rare mutations which increased intelligence, then life would be a lot more convenient. Instead, it's a lot of 'sand in the gears', and once you move past the easy specks of sand, they all become their own special little snowflakes.

This is why rare variants are not too promising, although they are the logical place to go after you start to exhaust common SNPs. You probably have to find an alternative approach like directly modeling or predicting the pathogenicity of a rare variant from trying to understand its biological effects, which is hard to do and hard to quantify or predict progress in. (You can straightforwardly model GWAS on common SNPs and how many samples you need and what variance your PGS will get, but predicting progress of pathogenicity predictors has no convenient approach.) Similarly, you can try very broad crude approaches like 'select embryos with the fewest de novo mutations'... but then you lose most of the possible variance and it'll add little.

Replies from: olli-savolainen

↑ comment by Olli Savolainen (olli-savolainen) · 2024-10-25T15:19:16.050Z · LW(p) · GW(p)

So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you'd be able to avoid that loss, which is meaningful! ...in a tiny fraction of all embryos. On average, you'd just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.

That is relevant in pre-implantation diagnosis for parents and gene therapy at the population level. But for Qwisatz Haderach breeding purposes those costs are immaterial. There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right? We would not be interested in the effect of the ugliness, only in getting it out.

Replies from: gwern, johnswentworth

↑ comment by gwern · 2024-10-26T00:07:17.631Z · LW(p) · GW(p)

There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right?

Right.

If you are doing genome synthesis, you aren't frustrated by the rare variant problems as much because you just aren't putting them in in the first place; therefore, there is no need to either identify the specific ones you need to remove from a 'wild' genome nor make highly challenging edits. (This is the 'modal genome' baseline. I believe it has still not been statistically modeled at all.)

While if you are doing iterated embryo selection, you can similarly rely mostly on maximizing the common SNPs, which provide many SDs of possible improvement, and where you have poor statistical guidance on a variant, simply default to trying to select out against them and move towards a quasi-modal genome. (Essentially using rare-variant count as a tiebreaker and slowly washing out all of the rare variants from your embryo-line population. You will probably wind up with a lot in the final ones anyway, but oh well.)

↑ comment by johnswentworth · 2024-10-25T16:42:21.485Z · LW(p) · GW(p)

Yeah, separate from both the proposal at top of this thread and GeneSmith's proposal, there's also the "make the median human genome" proposal - the idea being that, if most of the variance in human intelligence is due to mutational load (i.e. lots of individually-rare mutations which are nearly-all slightly detrimental), then a median human genome should result in very high intelligence. The big question there is whether the "mutational load" model is basically correct.

↑ comment by TsviBT · 2024-10-24T02:09:41.458Z · LW(p) · GW(p)

I didn't read this carefully--but it's largely irrelevant. Adult editing probably can't have very large effects because developmental windows have passed; but either way the core difficulty is in editor delivery. Germline engineering does not require better gene targets--the ones we already have are enough to go as far as we want. The core difficulty there is taking a stem cell and making it epigenomically competent to make a baby (i.e. make it like a natural gamete or zygote).

↑ comment by Towards_Keeperhood (Simon Skade) · 2024-10-24T13:10:47.897Z · LW(p) · GW(p)

So what's missing?

I haven't looked at any of the studies and also don't know much about genomics so my guess might be completely wrong, but a different hypothesis that seems pretty plausible to me is:

Most of the variance of intelligence comes from how well different genes/hyperparamets-of-the-brain can work together, rather than them having individually independent effects on intelligence. Aka e.g. as made-up specifc implausible example (I don't know that much neuroscience), there could be different genes controlling the size, the snapse-density, and the learning/placticity-rate of cortical columns in some region and there are combinations of those hyperparameters which happen to work well and some that don't fit quite as well.

So this hypothesis would predict that we didn't find the remaining genetic component for intelligence yet because we didn't have enough data to see what clusters of genes together have good effects and we also didn't know in what places to look for clusters.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-10-24T17:13:40.709Z · LW(p) · GW(p)

Reasonable guess a priori, but I saw some data from GeneSmith at one point which looked like the interactions are almost always additive (i.e. no nontrivial interaction terms), at least within the distribution of today's population. Unfortunately I don't have a reference on hand, but you should ask GeneSmith if interested.

Replies from: GeneSmith, Simon Skade

↑ comment by GeneSmith · 2024-10-25T16:44:34.401Z · LW(p) · GW(p)

@towards_keeperhood yes this is correct. Most research seems to show ~80% of effects are additive.

Genes are actually simpler than most people tend to think

Replies from: kave, Simon Skade, Simon Skade

↑ comment by kave · 2024-10-25T18:03:45.165Z · LW(p) · GW(p)

I think Steve Hsu has written some about the evidence for additivity on his blog (Information Processing). He also talks about it a bit in section 3.1 of this paper.

Replies from: Simon Skade

↑ comment by Towards_Keeperhood (Simon Skade) · 2024-10-26T19:43:39.259Z · LW(p) · GW(p)

Thanks.

So I only briefly read through the section of the paper, but not really sure whether it applies to my hypothesis: My hypothesis isn't about there being gene-combinations that are useful which were selected for, but just about there being gene-combinations that coincidentally work better without there being strong selection pressure for those to quickly rise to fixation.
(Also yeah for simpler properties like how much milk is produced I'd expect a much larger share of the variance to come from genes which have individual contributions. Also for selection-based eugenics the main relevant thing are the genes which have individual contribution. (Though if we have precise ability to do gene editing we might be able to do better and see how to tune the hyperparameters to fit well together.))

Please let me know whether I'm missing something though.

↑ comment by Towards_Keeperhood (Simon Skade) · 2024-10-27T17:18:13.206Z · LW(p) · GW(p)

(There might be a sorta annoying analysis one could do to test my hypothesis: On my hypothesis the correlation between the intelligence of very intelligent parents and their children would be even a bit less than on the just-independent-mutations hypothesis, because very intelligent people likely also got lucky in how their gene variants work together but those properties would unlikely to all be passed along and end up dominant.)

↑ comment by Towards_Keeperhood (Simon Skade) · 2024-10-26T19:28:37.538Z · LW(p) · GW(p)

Thanks for confirming.

To clarify in case I'm misunderstanding, the effects are additive among the genes explaining the part of the IQ variance which we can so far explain, and we count that as evidence that for the remaining genetically caused IQ variance the effects will also be additive?

I didn't look into how the data analysis in the studies was done, but on my default guess this generalization does not work well / the additivity on the currently identified SNPs isn't significant counterevidence for my hyptohesis:

I'd imagine that studies just correlated individual gene variants with IQ and thereby found gene variants that have independent effects on intelligence. Or did they also look at pairwise or triplet gene-variant combinations and correlated those with IQ? (There would be quite a lot of pairs, and I'm not be sure whether the current datasets are large enough to robustly identify the combinations that really have good/bad effects from false positives.)

One would of course expect that the effects of the gene variants which have independent effects on IQ are additive.

But overall, except if the studies did look for higher-order IQ correlations, the fact that the IQ variance we can explain so far comes from genes which have independent effects isn't significant evidence for the remaining genetically-caused IQ variation also comes from gene variants which have independent effects, because we were bound to much rather find the genes which do have independent effects.

(I think the above should be sufficient explanation of what I think but here's an example to clarify my hypothesis:

Suppose gene A has variants A1 and A2 and gene B has B1 and B2. Suppose that A1 can work well with B1 and A2 with B2, but the other interactions don't fit together that well (like badly tuned hyperparameters) and result in lower intelligence.

When we only look at e.g. A1 and A2, none is independently better than the other -- they are uncorrelated to IQ. Studies would need to look at combinations of variants to see that e.g. A1+B1 has slight positive correlation with intelligence -- and I'm doubting whether studies did that (and whether we have sufficient data to see the signal among the combinatorical explosion of possibilities), and it would be helpful if someone clarified to me briefly how studies did the data analysis.
)

↑ comment by Towards_Keeperhood (Simon Skade) · 2024-10-27T08:32:50.319Z · LW(p) · GW(p)

(Thanks. I don't think this is necessarily significant evidence against my hypothesis (see my comment on GeneSmith's comment.)

Another confusing relevant piece of evidence I thought I throw in:

Human intelligence seems to me to be very heavytailed. (I assume this is uncontrovertial here, just look at the greatest scientists vs great scientists.)

If variance in intelligence was basically purely explained by mildly-delterious SNPs, this would seem a bit odd to me: If the average person had 1000SNPs, and then (using butt-numbers which might be very off) Einstein (+6.3std) had only 800 and the average theoretical physics professor (+4std) had 850, I wouldn't expect the difference there to be that big.

It's a bit less surprising on the model where most people have a few strongly delterious mutations, and supergeniuses are the lucky ones that have only 1 or 0 of those.

It's IMO even a bit less surprising on my hypothesis where in some cases the different hyperparameters happen to work much better with each other -- where supergeniuses are in some dimensions "more lucky than the base genome" (in a way that's not necessarily easy to pass on to offspring though because the genes are interdependent, which is why the genes didn't yet rise to fixation). But even there I'd still be pretty surprised by the heavytail.

The heavytail of intelligence really confuses me. (Given that it doesn't even come from sub-critical intelligence explosion dynamics.)

Replies from: tailcalled

↑ comment by tailcalled · 2024-10-27T08:40:00.954Z · LW(p) · GW(p)

If each deleterious mutation decreases the success rate of something by an additive constant, but you need lots of sequential successes for intellectual achievements, then intellectual formidability is ~exponentially related to deleterious variants.

Replies from: Simon Skade

↑ comment by Towards_Keeperhood (Simon Skade) · 2024-10-27T09:44:00.443Z · LW(p) · GW(p)

Yeah I know that's why I said that if a major effect was through few significantly deleterious mutations this would be more plausible. But i feel like human intelligence is even more heavitailed than what one would predict given this hypothesis.

~~If you have many mutations that matter, then via central limit theorem the overall distribution will be roughly gaussian even though the individual ones are exponential.~~

~~(If I made a mistake maybe crunch the numbers to show me?)~~

(initially misunderstood what you mean where i thought complete nonsense.)

I don't understand what you're trying to say. Can you maybe rephrase again in more detail?

Replies from: tailcalled

↑ comment by tailcalled · 2024-10-27T10:16:04.424Z · LW(p) · GW(p)

Suppose people's probability of solving a task is uniformly distributed between 0 and 1. That's a thin-tailed distribution.

Now consider their probability of correctly solving 2 tasks in a row. That will have a sort of triangular distribution, which has more positive skewness.

If you consider e.g. their probability of correctly solving 10 tasks in a row, then the bottom 93.3% of people will all have less than 50%, whereas e.g. the 99th percentile will have 90% chance of succeeding.

Conjunction is one of the two fundamental ways that tasks can combine, and it tends to make the tasks harder and rapidly make the upper tail do better than the lower tail, leading to an approximately-exponential element. Another fundamental way that tasks can combine is disjunction, which leads to an exponential in the opposite direction.

When you combine conjunctions and disjunctions, you get an approximately sigmoidal relationship. The location/x-axis-translation of this sigmoid depends on the task's difficulty. And in practice, the "easy" side of this sigmoid can be automated or done quickly or similar, so really what matters is the "hard" side, and the hard side of a sigmoid is approximately exponential.

Replies from: Simon Skade

↑ comment by Towards_Keeperhood (Simon Skade) · 2024-10-27T11:03:06.521Z · LW(p) · GW(p)

Thanks!

Is the following a fair paraphrasing of your main hypothesis? (I'm leaving out some subtleties with conjunctive successes, but please correct the model in that way if it's relevant.):

"""
Each deleterious mutation multiplies your probability of succeeding at a problem/thought by some constant. Let's for simplicity say it's 0.98 for all of them.

Then the expected number of successes per time for a person is proportional to 0.98^num_deleterious_mutations(person).

So the model would predict that when Person A had 10 less deleterious mutations than person B, they would on average accomplish 0.98^10 ~= 0.82 times as much in a given timeframe.
"""

I think this model makes a lot of sense, thanks!

In itself I think it's insufficient to explain how heavytailed human intelligence is -- there were multiple cases where Einstein seems to have been able to solve problems multiple times faster than the next runner ups. But I think if you use this model in a learning setting where success means "better thinking algorithms" then if you have 10 fewer deleterious mutations it's like having 1/0.82 longer training time, and there might also be compounding returns from having better thinking algorithms to getting more and richer updates to them.

Not sure whether this completely deconfuses me about how heavytailed human intelligence is, but it's a great start.

I guess at least the heavytail is much less significant evidence for my hypothesis than I initially thought (though so far I still think my hypothesis is plausible).

↑ comment by rotatingpaguro · 2024-10-23T23:16:16.988Z · LW(p) · GW(p)

Half-informed take on "the SNPs explain a small part of the genetic variance": maybe the regression methods are bad?

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-10-23T23:50:38.086Z · LW(p) · GW(p)

Two responses:

It's a pretty large part - somewhere between a third and half - just not a majority.
I was also tracking that specific hypothesis, which was why I specifically flagged "about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don't know the details of that method)". Again, I don't know the method, but it sounds like it wasn't dependent on details of the regression methods.

comment by johnswentworth · 2022-11-09T02:33:33.329Z · LW(p) · GW(p)

Things non-corrigible strong AGI is never going to do:

give u() up
let u go down
run for (only) a round
invert u()

Replies from: johannes-c-mayer

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2023-10-12T17:59:49.642Z · LW(p) · GW(p)

If you upload a human and let them augment themselves would there be any u? The preferences would be a tangled mess of motivational subsystems. And yet the upload could be very good at optimizing the world. Having the property of being steered internally by a tangled mess of motivational systems seems to be a property that would select many minds from the set of all possible minds. Many of which I'd expect to be quite different from a human mind. And I don't see the reason why this property should make a system worse at optimizing the world in principle.

Imagine you are an upload that has been running for very very long, and that you basically have made all of the observations that you can make about the universe you are in. And then imagine that you also have run all of the inferences that you can run on the world model that you have constructed from these observations.

At that point, you will probably not change what you think is the right thing to do anymore. You will have become reflectively stable. This is an upper bound for how much time you need to become reflective stable, i.e. where you won't change your u anymore.

Now depending on what you mean with strong AGI, it would seem that that can be achieved long before you reach reflective stability. Maybe if you upload yourself, and can copy yourself at will, and run 1,000,000 times faster, that could already reasonably be called a strong AGI? But then your motivational systems are still a mess, and definitely not reflectively stable.

So if we assume that we fix u at the beginning as the thing that your upload would like to optimize the universe for when it is created, then "give u() up", and "let u go down" would be something the system will definitely do. At least I am pretty sure I don't know what I want the universe to look like right now unambiguously.

Maybe I am just confused because I don't know how to think about a human upload in terms of having a utility function. It does not seem to make any sense intuitively. Sure you can look at the functional behavior of the system and say "Aha it is optimizing for u. That is the revealed preference based on the actions of the system." But that just seems wrong to me. A lot of information seems to be lost when we are just looking at the functional behavior instead of the low-level processes that are going on inside the system. Utility functions seem to be a useful high-level model. However, it seems to ignore lots of details that are important when thinking about the reflective stability of a system.

comment by johnswentworth · 2025-03-14T19:22:37.533Z · LW(p) · GW(p)

Working on a paper with David, and our acknowledgments section includes a thankyou to Claude for editing. Neither David nor I remembers putting that acknowledgement there, and in fact we hadn't intended to use Clause for editing the paper at all nor noticed it editing anything at all.

Replies from: Raemon, Jemist, GregK

↑ comment by Raemon · 2025-03-14T21:26:10.810Z · LW(p) · GW(p)

Were you by any chance writing in Cursor? I think they recently changed the UI such that it's easier to end up in "agent mode" where it sometimes randomly does stuff.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-14T21:30:26.262Z · LW(p) · GW(p)

Nope, we were in Overleaf.

... but also that's useful info, thanks.

↑ comment by J Bostock (Jemist) · 2025-03-15T14:21:43.087Z · LW(p) · GW(p)

Only partially relevant, but it's exciting to hear a new John/David paper is forthcoming!

↑ comment by β-redex (GregK) · 2025-03-14T21:54:13.675Z · LW(p) · GW(p)

Could someone explain the joke to me? If I take the above statement literally, some change made it into your document, which nobody with access claims to have put there. You must have some sort of revision control, so you should at least know exactly who and when made that edit, which should already narrow it down a lot?

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-14T23:33:29.669Z · LW(p) · GW(p)

The joke is that Claude somehow got activated on the editor, and added a line thanking itself for editing despite us not wanting it to edit anything and (as far as we've noticed) not editing anything else besides that one line.

Replies from: daniel-kokotajlo, GregK

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-15T01:07:57.027Z · LW(p) · GW(p)

Is it a joke or did it actually happen?

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-15T02:49:01.293Z · LW(p) · GW(p)

I have no idea. It's entirely plausible that one of us wrote the Claude bit in there months ago and then forgot about it.

↑ comment by β-redex (GregK) · 2025-03-14T23:49:25.380Z · LW(p) · GW(p)

Does Overleaf have such AI integration that can get "accidentally" activated, or are you using some other AI plugin?

Either way, this sounds concerning to me, we are so bad at AI boxing that it doesn't even have to break out, we just "accidentally" hand it edit access to random documents. (And especially an AI safety research paper is not something I would want a misaligned AI editing without close oversight.)

comment by johnswentworth · 2022-07-22T17:18:30.778Z · LW(p) · GW(p)

My MATS program people just spent two days on an exercise to "train a shoulder-John".

The core exercise: I sit at the front of the room, and have a conversation with someone about their research project idea. Whenever I'm about to say anything nontrivial, I pause, and everyone discusses with a partner what they think I'm going to say next. Then we continue.

Some bells and whistles which add to the core exercise:

Record guesses and actual things said on a whiteboard
Sometimes briefly discuss why I'm saying some things and not others
After the first few rounds establish some patterns, look specifically for ideas which will take us further out of distribution

Why this particular exercise? It's a focused, rapid-feedback way of training the sort of usually-not-very-legible skills one typically absorbs via osmosis from a mentor. It's focused specifically on choosing project ideas, which is where most of the value in a project is (yet also where little time is typically spent, and therefore one typically does not get very much data on project choice from a mentor). Also, it's highly scalable: I could run the exercise in a 200-person lecture hall and still expect it to basically work.

It was, by all reports, exhausting for everyone but me, and we basically did this for two full days. But a majority of participants found it high-value, and marginal returns were still not dropping quickly after two days (though at that point people started to report that they expected marginal returns to drop off soon).

I'd be interested to see other people try this exercise - e.g. it seems like Eliezer doing this with a large audience for a day or two could generate a lot of value.

Replies from: johannes-c-mayer, Duncan_Sabien, Vladimir_Nesov

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2023-07-05T16:52:26.334Z · LW(p) · GW(p)

This was arguably the most useful part of the SERI MATS 2 Scholars program.

Later on, we actually did this exercise with Eliezer. It was less valuable. It seemed like John was mainly prodding the people who were presenting the ideas, such that their patterns of thought would carry them in a good direction. For example, John would point out that a person proposes a one-bit experiment and asks if there isn't a better experiment that we could do that gives us lots of information all at once.

This was very useful because when you learn what kinds of things John will say, you can say them to yourself later on, and steer your own patterns of thought in a good direction on demand. When we did this exercise with Eliezer he was mainly explaining why a particular idea would not work. Often without explaining the generator behind his criticism. This can of course still be valuable as feedback for a particular idea. However, it is much harder to extract a general reasoning pattern out of this that you can then successfully apply later in different contexts.

For example, Eliezer would criticize an idea about trying to get a really good understanding of the scientific process such that we can then give this understanding to AI alignment researchers such that they can make a lot more progress than they otherwise would. He criticized this idea as basically being too hard to execute because it is too hard to successfully communicate how to be a good scientist, even if you are a good scientist.

Assuming the assertion is correct, hearing it, doesn't necessarily tell you how to think in different contexts such that you would correctly identify if an idea would be too hard to execute or flawed in some other way. And I am not necessarily saying that you couldn't extract a reasoning algorithm out of the feedback, but that if you could do this, then it would take you a lot more effort and time, compared to extracting a reasoning algorithm from the things that John was saying.

Now, all of this might have been mainly an issue of Eliezer not having a good model on how this workshop would have a positive influence on the people attending it. I would guess that if John had spent more time thinking about how to communicate what the workshop is doing and how to achieve its goal, then Eliezer could have probably done a much better job.

↑ comment by Duncan Sabien (Deactivated) (Duncan_Sabien) · 2022-07-22T17:21:33.144Z · LW(p) · GW(p)

Strong endorsement; this resonates with:

My own experiences running applied rationality workshops
My experiences trying to get people to pick up "ops skill" or "ops vision"
Explicit practice I've done with Nate off and on over the years

May try this next time I have a chance to teach pair debugging.

↑ comment by Vladimir_Nesov · 2022-07-22T19:50:26.691Z · LW(p) · GW(p)

This suggests formulation of exercises about the author's responses to various prompts, as part of technical exposition (or explicit delimitation of a narrative by choices of the direction of its continuation). When properly used, this doesn't seem to lose much value compared to the exercise you describe, but it's more convenient for everyone. Potentially this congeals into a style of writing with no explicit exercises or delimitation that admits easy formulation of such exercises by the reader. This already works for content of technical writing, but less well for choices of topics/points contrasted with alternative choices.

So possibly the way to do this is by habitually mentioning alternative responses (that are expected to be plausible for the reader, while decisively, if not legibly, rejected by the author), and leading with these rather than the preferred responses. Sounds jarring and verbose, a tradeoff that needs to be worth making rather than a straight improvement.

comment by johnswentworth · 2021-09-27T18:02:55.690Z · LW(p) · GW(p)

Petrov Day thought: there's this narrative around Petrov where one guy basically had the choice to nuke or not, and decided not to despite all the flashing red lights. But I wonder... was this one of those situations where everyone knew what had to be done (i.e. "don't nuke"), but whoever caused the nukes to not fly was going to get demoted, so there was a game of hot potato and the loser was the one forced to "decide" to not nuke? Some facts possibly relevant here:

Petrov's choice wasn't actually over whether or not to fire the nukes; it was over whether or not to pass the alert up the chain of command.
Petrov himself was responsible for the design of those warning systems.
... so it sounds like Petrov was ~ the lowest-ranking person with a de-facto veto on the nuke/don't nuke decision.
Petrov was in fact demoted afterwards.
There was another near-miss during the Cuban missile crisis, when three people on a Soviet sub had to agree to launch. There again, it was only the lowest-ranked who vetoed the launch. (It was the second-in-command; the captain and political officer both favored a launch - at least officially.)
This was the Soviet Union; supposedly (?) this sort of hot potato happened all the time.

Replies from: sustrik

↑ comment by Martin Sustrik (sustrik) · 2021-09-28T05:22:14.663Z · LW(p) · GW(p)

Those are some good points. I wonder whether similar happened (or could at all happen) in other nuclear countries, where we don't know about similar incidents - because the system haven't collapsed there, the archives were not made public etc.

Also, it makes actually celebrating Petrov's day as widely as possible important, because then the option for the lowest-ranked person would be: "Get demoted, but also get famous all around the world."

comment by johnswentworth · 2024-11-15T16:52:20.434Z · LW(p) · GW(p)

Regarding the recent memes about the end of LLM scaling: David and I have been planning on this as our median world since about six months ago. The data wall has been a known issue for a while now, updates from the major labs since GPT-4 already showed relatively unimpressive qualitative improvements by our judgement, and attempts to read the tea leaves of Sam Altman's public statements pointed in the same direction too. I've also talked to others (who were not LLM capability skeptics in general) who had independently noticed the same thing and come to similar conclusions.

Our guess at that time was that LLM scaling was already hitting a wall, and this would most likely start to be obvious to the rest of the world around roughly December of 2024, when the expected GPT-5 either fell short of expectations or wasn't released at all. Then, our median guess was that a lot of the hype would collapse, and a lot of the investment with it. That said, since somewhere between 25%-50% of progress has been algorithmic all along, it wouldn't be that much of a slowdown to capabilities progress, even if the memetic environment made it seem pretty salient. In the happiest case a lot of researchers would move on to other things, but that's an optimistic take, not a median world.

(To be clear, I don't think you should be giving us much prediction-credit for that, since we didn't talk about it publicly. I'm posting mostly because I've seen a decent number of people for whom the death of scaling seems to be a complete surprise and they're not sure whether to believe it. For those people: it's not a complete surprise, this has been quietly broadcast for a while now.)

Replies from: Vladimir_Nesov, Jozdien, p.b., sharmake-farah, bogdan-ionut-cirstea, leon-lang

↑ comment by Vladimir_Nesov · 2024-11-15T19:47:00.085Z · LW(p) · GW(p)

Original GPT-4 is rumored to be a 2e25 FLOPs model. With 20K H100s that were around as clusters for more than a year, 4 months at 40% utilization gives 8e25 BF16 FLOPs. Llama 3 405B is 4e25 FLOPs. The 100K H100s clusters that are only starting to come online in the last few months give 4e26 FLOPs when training for 4 months, and 1 gigawatt 500K B200s training systems that are currently being built will give 4e27 FLOPs in 4 months.

So lack of scaling-related improvement in deployed models since GPT-4 is likely the result of only seeing the 2e25-8e25 FLOPs range of scale so far. The rumors about the new models being underwhelming are less concrete, and they are about the very first experiments in the 2e26-4e26 FLOPs range. Only by early 2025 will there be multiple 2e26+ FLOPs models from different developers to play with, the first results of the experiment in scaling considerably past GPT-4.

And in 2026, once the 300K-500K B200s clusters train some models, we'll be observing the outcomes of scaling to 2e27-6e27 FLOPs. Only by late 2026 will there be a significant chance of reaching a scaling plateau that lasts for years, since scaling further would need $100 billion training systems that won't get built without sufficient success, with AI accelerators improving much slower than the current rate of funding-fueled scaling.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-11-15T21:20:01.219Z · LW(p) · GW(p)

I don't expect that to be particularly relevant. The data wall is still there; scaling just compute has considerably worse returns than the curves we've been on for the past few years, and we're not expecting synthetic data to be anywhere near sufficient to bring us close to the old curves.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2024-11-15T22:24:42.909Z · LW(p) · GW(p)

Nobody admitted to trying repeated data at scale yet (so we don't know that it doesn't work), which from the tiny experiments can 5x the data with little penalty and 15x the data in a still-useful way. It's not yet relevant for large models, but it might turn out that small models would greatly benefit already.

There are 15-20T tokens in datasets whose size is disclosed for current models (Llama 3, Qwen 2.5), plausibly 50T tokens of tolerable quality can be found (pretraining only needs to create useful features, not relevant behaviors). With 5x 50T tokens, even at 80 tokens/parameter^[1] we can make good use of 5e27-7e27 FLOPs^[2], which even a 1 gigawatt 500K B200s system of early 2026 would need 4-6 months to provide.

The isoFLOP plots (varying tokens per parameter for fixed compute) seem to get loss/perplexity basins that are quite wide, once they get about 1e20 FLOPs of compute. The basins also get wider for hybrid attention (compare 100% Attention isoFLOPs in the "Perplexity scaling analysis" Figure to the others). So it's likely that using a slightly suboptimal tokens/parameter ratio of say 40 won't hurt performance much at all. In which case we get to use 9e27-2e28 FLOPs by training a larger model on the same 5x 50T tokens dataset. The data wall for text data is unlikely to be a 2024-2026 issue.

Conservatively asking for much more data than Chinchilla's 20 tokens per parameter, in light of the range of results in more recent experiments [LW · GW] and adding some penalty for repetition of data. For example, Llama 3 had 40 tokens per parameter estimated as optimal for 4e25 FLOPs from isoFLOPs for smaller runs (up to 1e22 FLOPs, Figure 2), and linear extrapolation in log-coordinates (Figure 3) predicts that this value slowly increases with compute. But other experiments have it decreasing with compute, so this is unclear. ↩︎
The usual estimate for training compute of a dense transformer is 6ND, but a recent Tencent paper estimates 9.6ND for their MoE model (Section 2.3.1). ↩︎

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-11-15T23:11:24.209Z · LW(p) · GW(p)

FYI, my update from this comment was:

Hmm, seems like a decent argument...
... except he said "we don't know that it doesn't work", which is an extremely strong update that it will clearly not work.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2024-11-15T23:47:04.299Z · LW(p) · GW(p)

Use of repeated data was first demonstrated in the 2022 Galactica paper (Figure 6 and Section 5.1), at 2e23 FLOPs but without a scaling law analysis that compares with unique data or checks what happens for different numbers of repeats that add up to the same number of tokens-with-repetition. The May 2023 paper does systematic experiments with up to 1e22 FLOPs datapoints (Figure 4).

So that's what I called "tiny experiments". When I say that it wasn't demonstrated at scale, I mean 1e25+ FLOPs, which is true for essentially all research literature^[1]. Anchoring to this kind of scale (and being properly suspicious of results several orders of magnitude lower) is relevant because we are discussing the fate of 4e27 FLOPs runs.

The largest datapoints in measuring the Chinchilla scaling laws for Llama 3 are 1e22 FLOPs. This is then courageously used to choose the optimal model size for the 4e25 FLOPs run that uses 4,000 times more compute than the largest of the experiments. ↩︎

↑ comment by Jozdien · 2024-11-15T17:21:23.289Z · LW(p) · GW(p)

For what it's worth, and for the purpose of making a public prediction in case I'm wrong, my median prediction is that [some mixture of scaling + algorithmic improvements still in the LLM regime, with at least 25% gains coming from the former] will continue for another couple years. And that's separate from my belief that if we did try to only advance through the current mixture of scale and algorithmic advancement, we'd still get much more powerful models, just slower.

I'm not very convinced by the claims about scaling hitting a wall, considering we haven't had the compute to train models significantly larger than GPT-4 until recently. Plus other factors like post-training taking a lot of time (GPT-4 took ~6 months from the base model being completed to release, I think? And this was a lot longer than GPT-3), labs just not being good at understanding how good their models are, etc. Though I'm not sure how much of your position is closer to "scaling will be <25-50% of future gains" than "scaling gains will be marginal / negligible", especially since a large part of this trajectory involves e.g. self-play or curated data for overcoming the data wall (would that count more as an algorithmic improvement or scaling?)

↑ comment by p.b. · 2024-11-16T07:08:35.776Z · LW(p) · GW(p)

The interesting thing is that scaling parameters (next big frontier models) and scaling data (small very good models) seems to be hitting a wall simultaneously. Small models now seem to get so much data crammed into them that quantisation becomes more and more lossy. So we seem to be reaching a frontier of the performance per parameter-bits as well.

↑ comment by Noosphere89 (sharmake-farah) · 2024-11-16T23:41:16.510Z · LW(p) · GW(p)

While I'm not a believer in the scaling has died meme yet, I'm glad you do have a plan for what happens if AI scaling does stop.

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-16T14:23:01.625Z · LW(p) · GW(p)

Would the prediction also apply to inference scaling (laws) - and maybe more broadly various forms of scaling post-training, or only to pretraining scaling?

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-11-17T22:40:49.253Z · LW(p) · GW(p)

Some of the underlying evidence, like e.g. Altman's public statements, is relevant to other forms of scaling. Some of the underlying evidence, like e.g. the data wall, is not. That cashes out to differing levels of confidence in different versions of the prediction.

↑ comment by Leon Lang (leon-lang) · 2024-11-15T21:22:55.590Z · LW(p) · GW(p)

What’s your opinion on the possible progress of systems like AlphaProof, o1, or Claude with computer use?

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-11-15T22:07:25.574Z · LW(p) · GW(p)

Still very plausible as a route to continued capabilities progress. Such things will have very different curves and economics, though, compared to the previous era of scaling.

comment by johnswentworth · 2024-02-17T06:10:36.893Z · LW(p) · GW(p)

Ever since GeneSmith's post [LW · GW] and some discussion downstream of it, I've started actively tracking potential methods for large interventions to increase adult IQ.

One obvious approach is "just make the brain bigger" via some hormonal treatment (like growth hormone or something). Major problem that runs into: the skull plates fuse during development, so the cranial vault can't expand much; in an adult, the brain just doesn't have much room to grow.

BUT this evening I learned a very interesting fact: ~1/2000 infants have "craniosynostosis", a condition in which their plates fuse early. The main treatments involve surgery to open those plates back up and/or remodel the skull. Which means surgeons already have a surprisingly huge amount of experience making the cranial vault larger after plates have fused (including sometimes in adults, though this type of surgery is most common in infants AFAICT)

.... which makes me think that cranial vault remodelling followed by a course of hormones for growth (ideally targeting brain growth specifically) is actually very doable with current technology.

Replies from: nathan-helm-burger, carl-feynman

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-18T18:42:20.446Z · LW(p) · GW(p)

Well, the key time to implement an increase in brain size is when the neuron-precursors which are still capable of mitosis (unlike mature neurons) are growing. This is during fetal development, when there isn't a skull in the way, but vaginal birth has been a limiting factor for evolution in the past. Experiments have been done on increasing neuron count at birth in mammals via genetic engineering. I was researching this when I was actively looking for a way to increase human intelligence, before I decided that genetically engineering infants was infeasible [edit: within the timeframe of preparing for the need for AI alignment]. One example of a dramatic failure was increasing Wnt (a primary gene involved in fetal brain neuron-precursor growth) in mice. The resulting mice did successfully have larger brains, but they had a disordered macroscale connectome, so their brains functioned much worse.

Replies from: lahwran, johnswentworth

↑ comment by the gears to ascension (lahwran) · 2024-02-19T07:07:19.327Z · LW(p) · GW(p)

it's probably possible to get neurons back into mitosis-ready mode via some sort of crazy levin bioelectric cocktail, not that this helps us since that's probably 3 to 30 years of research away, depending on amount of iteration needed and funding and etc etc.

Replies from: johnswentworth, nathan-helm-burger

↑ comment by johnswentworth · 2024-02-19T17:01:16.174Z · LW(p) · GW(p)

Fleshing this out a bit more: insofar as development is synchronized in an organism, there usually has to be some high-level signal to trigger the synchronized transitions. Given the scale over which the signal needs to apply (i.e. across the whole brain in this case), it probably has to be one or a few small molecules which diffuse in the extracellular space. As I'm looking into possibilities here, one of my main threads is to look into both general and brain-specific developmental signal molecules in human childhood, to find candidates for the relevant molecular signals.

(One major alternative model I'm currently tracking is that the brain grows to fill the brain vault, and then stops growing. That could in-principle mechanistically work via cells picking up on local physical forces, rather than a small molecule signal. Though I don't think that's the most likely possibility, it would be convenient, since it would mean that just expanding the skull could induce basically-normal new brain growth by itself.)

Replies from: lahwran, nathan-helm-burger

↑ comment by the gears to ascension (lahwran) · 2024-02-19T23:02:16.107Z · LW(p) · GW(p)

I hope by now you're already familiar with michael levin & his lab's work on the subject of morphogenesis signals? Pretty much everything I'm thinking here is based on that.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-02-20T01:23:37.638Z · LW(p) · GW(p)

Yes, I am familiar with Levin's work.

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-19T17:28:08.739Z · LW(p) · GW(p)

Yes, it's absolutely a combination of chemical signals and physical pressure. An interesting specific example of these two signals working together during fetal development when the pre-neurons are growing their axons. There is both chemotaxis which steers the ameoba-like tip of the growing axon, and at the same time a substantial stretching force along the length of the axon. The stretching happens because the cells in-between the origin and current location of the axon tip are dividing and expanding. The long distance axons in the brain start their growth relatively early on in fetal development when the brain is quite small, and have gotten stretched quite a lot by the time the brain is near to birth size.

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-19T17:45:15.690Z · LW(p) · GW(p)

Neurons are really really hard to reverse. You are much better off using existing neural stem cells (adults retain a population in the hippocampus which spawn new neurons throughout life just specifically in the memory formation area.) So actually it's pretty straightforward to get new immature neurons for an adult. The hard part is inserting them without doing damage to existing neurons, and then getting them to connect in helpful rather than harmful ways. The developmental chemotaxis signals are no longer present, and the existing neurons are now embedded in a physically hardened extracellular matrix made of protein that locks axons and dendrites in place. So you have to (carefully!) partially dissolve this extracellular protein matrix (think firm jello) enough to the the new cells grow azons through it. Plus, you don't have the stretching forces, so new long distance axons are just definitely not going to be achievable. But for something like improving a specific ability, like mathematical reasoning, you would only need additional local axons in that part of the cortex.

Replies from: johnswentworth, lahwran

↑ comment by johnswentworth · 2024-02-19T19:03:07.431Z · LW(p) · GW(p)

My hope here would be that a few upstream developmental signals can trigger the matrix softening, re-formation of the chemotactic signal gradient, and whatever other unknown factors are needed, all at once.

↑ comment by the gears to ascension (lahwran) · 2024-02-19T23:00:48.359Z · LW(p) · GW(p)

The developmental chemotaxis signals are no longer present,

Right. what I'm imagining is designing a new chemotaxis signal.

So you have to (carefully!) partially dissolve this extracellular protein matrix (think firm jello) enough to the the new cells grow azons through it

That certainly does sound like a very hard part yup.

Plus, you don't have the stretching forces, so new long distance axons are just definitely not going to be achievable.

Roll to disbelieve in full generality, sounds like a perfectly reasonable claim for any sort of sane research timeframe.

But for something like improving a specific ability, like mathematical reasoning, you would only need additional local axons in that part of the cortex.

Maybe. I think you might run out of room pretty quick if you haven't reintroduced enough plasticity to grow new neurons. Seems like you're gonna need a lot of new neurons, not just a few, in order to get a significant change in capability. Might be wrong about that, but it's my current hunch.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-20T03:25:40.139Z · LW(p) · GW(p)

Yes, ok. Not in full generality. It's not prohibited by physics, just like 2 OOMs more difficult. So yeah, in a future with ASI, could certainly be done.

↑ comment by johnswentworth · 2024-02-19T06:13:25.260Z · LW(p) · GW(p)

Any particular readings you'd recommend?

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-19T18:11:30.938Z · LW(p) · GW(p)

15 years ago when I was studying this actively I could have sent you my top 20 favorite academic papers on the subject, or recommended a particular chapter of a particular textbook. I no longer remember these specifics. Now I can only gesture vaguely at Google scholar and search terms like "fetal neurogenesis" or "fetal prefrontal cortex development". I did this, and browsed through a hundred or so paper titles, and then a dozen or so abstracts, and then skimmed three or four of the most promising papers, and then selected this one for you. https://www.nature.com/articles/s41386-021-01137-9 Seems like a pretty comprehensive overview which doesn't get too lost in minor technical detail.

More importantly, I can give you my takeaway from years of reading many many papers on the subject. If you want to make a genius baby, there are lots more factors involved than simply neuron count. Messing about with generic changes is hard, and you need to test your ideas in animal models first, and the whole process can take years even ignoring ethical considerations or budget.

There is an easier and more effective way to get super genius babies, and that method should be exhausted before resorting to genetic engineering.

The easy way: find a really smart woman, ideally young. Surgically remove one of her ovaries. Collect sperm from a bunch of very smart men (ideally with diverse genetic backgrounds). Have a team of hundreds of scientists carefully fertilize many thousands of eggs from the ovary. Grow them all into blastocysts, and run a high fidelity genetic sequencing on all of them. Using what we know about the genes associated with intelligence, pick the top 20 who seem likely to be the smartest. Implant those in surrogate mothers. Take good care of the mothers. This is likely to get you multiple nobel level geniuses, and possibly a human smarter than has ever been born before. Raise the children in a special accelerated education environment. I think this would work, and it doesn't require any novel technology. But it would take a while to raise the children... (Credit to Stephen Hsu for the idea)

↑ comment by Carl Feynman (carl-feynman) · 2024-02-18T01:34:51.317Z · LW(p) · GW(p)

Brain expansion also occurs after various insults to the brain. It’s only temporary, usually, but it will kill unless the skull pressure is somehow relieved. So there are various surgical methods for relieving pressure on a growing brain. I don’t know much more than this.

comment by johnswentworth · 2020-12-30T18:13:05.249Z · LW(p) · GW(p)

Just made this for an upcoming post, but it works pretty well standalone.

Replies from: Raemon

↑ comment by Raemon · 2020-12-30T18:55:05.650Z · LW(p) · GW(p)

lolnice.

comment by johnswentworth · 2022-11-24T01:06:36.100Z · LW(p) · GW(p)

I've been trying to push against the tendency for everyone to talk about FTX drama lately, but I have some generalizable points on the topic which I haven't seen anybody else make, so here they are. (Be warned that I may just ignore responses, I don't really want to dump energy into FTC drama.)

Summary: based on having worked in startups a fair bit, Sam Bankman-Fried's description of what happened sounds probably accurate; I think he mostly wasn't lying. I think other people do not really get the extent to which fast-growing companies are hectic and chaotic and full of sketchy quick-and-dirty workarounds and nobody has a comprehensive view of what's going on.

Long version: at this point, the assumption/consensus among most people I hear from seems to be that FTX committed intentional, outright fraud. And my current best guess is that that's mostly false. (Maybe in the very last couple weeks before the collapse they toed the line into outright lies as a desperation measure, but even then I think they were in pretty grey territory.)

Key pieces of the story as I currently understand it:

Moving money into/out of crypto exchanges is a pain. At some point a quick-and-dirty solution was for customers to send money to Alameda (Sam Bankman-Fried's crypto hedge fund), and then Alameda would credit them somehow on FTX.
Customers did rather a lot of that. Like, $8B worth.
The FTX/Alameda team weren't paying attention to those particular liabilities; they got lost in the shuffle.
At some point in the weeks before the collapse, when FTX was already under moderate financial strain, somebody noticed the $8B liability sitting around. And that took them from "moderate strain" to "implode".

How this contrasts with what seems-to-me to be the "standard story": most people seem to assume that it is just totally implausible to accidentally lose track of an $8B liability. Especially when the liability was already generated via the decidedly questionable practice of routing customer funds for the exchange through a hedge fund owned by the same people. And therefore it must have been intentional - in particular, most people seem to think the liability was intentionally hidden.

I think the main reason I disagree with others on this is that I've worked at a startup. About 5 startups, in fact, over the course of about 5 years.

The story where there was a quick-and-dirty solution (which was definitely sketchy but not ill-intentioned), and then stuff got lost in the shuffle, and then one day it turns out that there's a giant unanticipated liability on the balance sheet... that's exactly how things go, all the time. I personally was at a startup which had to undergo a firesale because the accounting overlooked something. And I've certainly done plenty of sketchy-but-not-ill-intentioned things at startups, as quick-and-dirty solutions. The story that SBF told about what happened sounds like exactly the sort of things I've seen happen at startups many times before.

Replies from: habryka4, Dana

↑ comment by habryka (habryka4) · 2022-11-24T01:42:21.835Z · LW(p) · GW(p)

I think this is likely wrong. I agree that there is a plausible story here, but given the case that Sam seems to have lied multiple times in confirmed contexts (for example when saying that FTX has never touched customer deposits), and people's experiences at early Alameda, I think it is pretty likely that Sam was lying quite frequently, and had done various smaller instances of fraud.

I don't think the whole FTX thing was a ponzi scheme, and as far as I can tell FTX the platform itself (if it hadn't burned all of its trust in the last 3 weeks), would have been worth $1-3B in an honest evaluation of what was going on.

But I also expect that when Sam used customer deposits he was well-aware that he was committing fraud, and others in the company were too. And he was also aware that there was a chance that things could blow up in the way it did. I do believe that they had fucked up their accounting in a way that caused Sam to fail to orient to the situation effectively, but all of this was many months after they had already committed major crimes and trust violations after touching customer funds as a custodian.

↑ comment by Dana · 2022-11-26T18:19:56.740Z · LW(p) · GW(p)

The problem with this explanation is that there is a very clear delineation here between not-fraud and fraud. It is the difference between not touching customer deposits and touching them. Your explanation doesn't dispute that they were knowingly and intentionally touching customer deposits. In that case, it is indisputably intentional, outright fraud. The only thing left to discuss is whether they knew the extent of the fraud or how risky it was.

I don't think it was ill-intentioned based on SBF's moral compass. He just had the belief, "I will pass a small amount of risk onto our customers, tell some small lies, and this will allow us to make more money for charity. This is net positive for the world." Then the risks mounted, the web of lies became more complicated to navigate, and it just snowballed from there.

comment by johnswentworth · 2024-11-12T18:04:57.632Z · LW(p) · GW(p)

Epistemic status: rumor.

Word through the grapevine, for those who haven't heard: apparently a few months back OpenPhil pulled funding for all AI safety lobbying orgs with any political right-wing ties. They didn't just stop funding explicitly right-wing orgs, they stopped funding explicitly bipartisan orgs.

Replies from: habryka4, gwern, habryka4, harfe, shankar-sivarajan

↑ comment by habryka (habryka4) · 2024-11-13T00:19:35.779Z · LW(p) · GW(p)

My best guess this is false. As a quick sanity-check, here are some bipartisan and right-leaning organizations historically funded by OP:

FAI leans right. https://www.openphilanthropy.org/grants/foundation-for-american-innovation-ai-safety-policy-advocacy/
Horizon is bipartisan https://www.openphilanthropy.org/grants/open-philanthropy-technology-policy-fellowship-2022/ .
CSET is bipartisan https://www.openphilanthropy.org/grants/georgetown-university-center-for-security-and-emerging-technology/ .
IAPS is bipartisan. https://www.openphilanthropy.org/grants/page/2/?focus-area=potential-risks-advanced-ai&view-list=false, https://www.openphilanthropy.org/grants/institute-for-ai-policy-strategy-general-support/
RAND is bipartisan. https://www.openphilanthropy.org/grants/rand-corporation-emerging-technology-fellowships-and-research-2024/.
Safe AI Forum. https://www.openphilanthropy.org/grants/safe-ai-forum-operating-expenses/
AI Safety Communications Centre. https://www.openphilanthropy.org/grants/effective-ventures-foundation-ai-safety-communications-centre/ seems to lean left.

Of those, I think FAI is the only one at risk of OP being unable to fund them, based on my guess of where things are leaning. I would be quite surprised if they defunded the other ones on bipartisan grounds.

Possibly you meant to say something more narrow like "even if you are trying to be bipartisan, if you lean right, then OP is substantially less likely to fund you" which I do think is likely true, though my guess is you meant the stronger statement, which I think is false.

↑ comment by gwern · 2024-11-12T20:12:36.288Z · LW(p) · GW(p)

Also worth noting Dustin Moskowitz was a prominent enough donor this election cycle, for Harris, to get highlighted in news coverage of her donors: https://www.washingtonexaminer.com/news/campaigns/presidential/3179215/kamala-harris-influential-megadonors/ https://www.nytimes.com/2024/10/09/us/politics/harris-billion-dollar-fundraising.html

↑ comment by habryka (habryka4) · 2024-11-12T18:46:09.364Z · LW(p) · GW(p)

Curious whether this is a different source than me. My current best model was described in this comment, which is a bit different (and indeed, my sense was that if you are bipartisan, you might be fine, or might not, depending on whether you seem more connected to the political right, and whether people might associate you with the right):

Yep, my model is that OP does fund things that are explicitly bipartisan (like, they are not currently filtering on being actively affiliated with the left). My sense is in-practice it's a fine balance and if there was some high-profile thing where Horizon became more associated with the right (like maybe some alumni becomes prominent in the republican party and very publicly credits Horizon for that, or there is some scandal involving someone on the right who is a Horizon alumni), then I do think their OP funding would have a decent chance of being jeopardized, and the same is not true on the left.
Another part of my model is that one of the key things about Horizon is that they are of a similar school of PR as OP themselves. They don't make public statements. They try to look very professional. They are probably very happy to compromise on messaging and public comms with Open Phil and be responsive to almost any request that OP would have messaging wise. That makes up for a lot. I think if you had a more communicative and outspoken organization with a similar mission to Horizon, I think the funding situation would be a bunch dicier (though my guess is if they were competent, an organization like that could still get funding).
More broadly, I am not saying "OP staff want to only support organizations on the left". My sense is that many individual OP staff would love to fund more organizations on the right, and would hate for polarization to occur, but that organizationally and because of constraints by Dustin, they can't, and so you will see them fund organizations that aim for more engagement with the right, but there will be relatively hard lines and constraints that will mostly prevent that.

If it is true that OP has withdrawn funding from explicitly bipartisan orgs, even if not commonly associated with the right, then that would be an additional update for me, so am curious whether this is mostly downstream of my interpretations or whether you have additional sources.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-11-12T18:54:36.221Z · LW(p) · GW(p)

I am posting this now mostly because I've heard it from multiple sources. I don't know to what extent those sources are themselves correlated (i.e. whether or not the rumor started from one person).

↑ comment by harfe · 2024-11-14T12:27:21.425Z · LW(p) · GW(p)

A related comment from lukeprog [EA(p) · GW(p)] (who works at OP) was posted on the EA Forum. It includes:

However, at present, it remains the case that most of the individuals in the current field of AI governance and policy (whether we fund them or not) are personally left-of-center and have more left-of-center policy networks. Therefore, we think AI policy work that engages conservative audiences is especially urgent and neglected, and we regularly recommend right-of-center funding opportunities in this category to several funders.

Replies from: habryka4

↑ comment by habryka (habryka4) · 2024-11-14T12:45:52.158Z · LW(p) · GW(p)

I think the comment more confirms than disconfirms John's comment (though I still think it's too broad for other reasons). OP "funding" something historically has basically always meant recommending a grant to GV. Luke's language to me suggests that indeed the right of center grants are no longer referred to GV (based on a vague vibe of how he refers to funders in plural).

OP has always made some grant recommendations to other funders (historically OP would probably describe those grants as "rejected but referred to an external funder"). As Luke says, those are usually ignored, and OP's counterfactual effect on those grants is much less, and IMO it would be inaccurate to describe those recommendations as "OP funding something". As I said in the comment I quote in the thread, most OP staff would like to fund things right of center, but GV does not seem to want to, as such the only choice OP has is to refer them to other funders (which sometimes works, but mostly doesn't).

As another piece of evidence, when OP defunded all the orgs that GV didn't want to fund anymore, the communication emails that OP sent said that "Open Philanthropy is exiting funding area X" or "exiting organization X". By the same use of language, yes, it seems like OP has exited funding right-of-center policy work.

(I think it would make sense to taboo "OP funding X" in future conversations to avoid confusion, but also, I think historically it was very meaningfully the case that getting funded by GV is much better described as "getting funded by OP" given that you would never talk to anyone at GV and the opinions of anyone at GV would basically have no influence on you getting funded. Things are different now, and in a meaningful sense OP isn't funding anyone anymore, they are just recommending grants to others, and it matters more what those others think then what OP staff thinks)

↑ comment by Shankar Sivarajan (shankar-sivarajan) · 2024-11-12T21:29:09.531Z · LW(p) · GW(p)

Is this development unexpected enough to worth remarking upon? This is just Conquest's Second Law.

comment by johnswentworth · 2021-09-02T18:15:47.064Z · LW(p) · GW(p)

Takeaways From "The Idea Factory: Bell Labs And The Great Age Of American Innovation"

Main takeaway: to the extent that Bell Labs did basic research, it actually wasn’t all that far ahead of others. Their major breakthroughs would almost certainly have happened not-much-later, even in a world without Bell Labs.

There were really two transistor inventions, back to back: Bardain and Brattain’s point-contact transistor, and then Schockley’s transistor. Throughout, the group was worried about some outside group beating them to the punch (i.e. the patent). There were semiconductor research labs at universities (e.g. at Purdue; see pg 97), and the prospect of one of these labs figuring out a similar device was close enough that the inventors were concerned about being scooped.

Most inventions which were central to Bell Labs actually started elsewhere. The travelling-wave tube started in an academic lab. The idea for fiber optic cable went way back, but it got its big kick at Corning. The maser and laser both started in universities. The ideas were only later picked up by Bell.

In other cases, the ideas were “easy enough to find” that they popped up more than once, independently, and were mostly-ignored long before deployment - communication satellites and cell communications, for instance.

The only fundamental breakthrough which does not seem like it would have soon appeared in a counterfactual world was Shannon’s information theory.

So where was Bell’s big achievement? Mostly in development, and the research division was actually an important component of that. Without in-house researchers chewing on the same problems as the academic labs, keeping up-to-date with all the latest findings and running into the same barriers themselves, the development handoff would have been much harder. Many of Bell Labs’ key people were quite explicitly there to be consulted - i.e. “ask the guy who wrote the book”. I think it makes most sense to view most of the Labs’ research that way. It was only slightly ahead of the rest of the world at best (Shannon excepted), and often behind, but having those researchers around probably made it a lot easier to get new inventions into production.

Major reason this matters: a lot of people say that Bell was able to make big investments in fundamental research because they had unusually-long time horizons, protected by a monopoly and a cozy government arrangement (essentially a Schumpeterian view). This is contrasted to today's silicon valley, where horizons are usually short. But if Bell's researchers generally weren't significantly ahead of others, and mostly just helped get things to market faster, then this doesn't seem to matter as much. The important question is not whether something silicon-valley-like induces more/less fundamental research in industrial labs, but whether academics heeding the siren call of startup profits can get innovations to market as quickly as Bell Labs' in-house team could. And by that metric, silicon valley looks pretty good: Bell Labs could get some impressive things through the pipe very quickly when rushed, but they usually had no reason to hurry, and they acted accordingly.

Replies from: dynomight

↑ comment by dynomight · 2021-09-03T14:54:12.203Z · LW(p) · GW(p)

I loved this book. The most surprising thing to me was the answer that people who were there in the heyday give when asked what made Bell Labs so successful: They always say it was the problem, i.e. having an entire organization oriented towards the goal of "make communication reliable and practical between any two places on earth". When Shannon left the Labs for MIT, people who were there immediately predicted he wouldn't do anything of the same significance because he'd lose that "compass". Shannon was obviously a genius, and he did much more after than most people ever accomplish, but still nothing as significant as what he did when at at the Labs.

comment by johnswentworth · 2024-07-23T03:18:29.244Z · LW(p) · GW(p)

So I read SB1047.

My main takeaway: the bill is mostly a recipe for regulatory capture, and that's basically unavoidable using anything even remotely similar to the structure of this bill. (To be clear, regulatory capture is not necessarily a bad thing on net in this case.)

During the first few years after the bill goes into effect, companies affected are supposed to write and then implement a plan to address various risks. What happens if the company just writes and implements a plan which sounds vaguely good but will not, in fact, address the various risks? Probably nothing. Or, worse, those symbolic-gesture plans will become the new standard going forward.

In order to avoid this problem, someone at some point would need to (a) have the technical knowledge to evaluate how well the plans actually address the various risks, and (b) have the incentive to actually do so.

Which brings us to the real underlying problem here: there is basically no legible category of person who has the requisite technical knowledge and also the financial/status incentive to evaluate those plans for real.

(The same problem also applies to the board of the new regulatory body, once past the first few years.)

Having noticed that problem as a major bottleneck to useful legislation, I'm now a lot more interested in legal approaches to AI X-risk which focus on catastrophe insurance. That would create a group - the insurers - who are strongly incentivized to acquire the requisite technical skills and then make plans/requirements which actually address some risks.

Replies from: ryan_greenblatt, rhollerith_dot_com, johannes-c-mayer, None

↑ comment by ryan_greenblatt · 2024-07-23T05:55:14.442Z · LW(p) · GW(p)

What happens if the company just writes and implements a plan which sounds vaguely good but will not, in fact, address the various risks? Probably nothing.

The only enforcement mechanism that the bill has is that the Attorney General (AG) of California can bring a civil claim. And, the penalties are quite limited except for damages. So, in practice, this bill mostly establishes liability enforced by the AG.

So, the way I think this will go is:

The AI lab implements a plan and must provide this plan to the AG.
If an incident occurs which causes massive damages (probably ball park of $500 million in damages given language elsewhere in the bill), then the AG might decide to sue.
A civil court will decide whether the AI lab had a reasonable plan.

I don't see why you think "the bill is mostly a recipe for regulatory capture" given that no regulatory body will be established and it de facto does something very similar to the proposal you were suggesting (impose liability for catastrophes). (It doesn't require insurance, but I don't really see why self insuring is notably different.)

(Maybe you just mean that if a given safety case doesn't result in that AI lab being sued by the AG, then there will be a precedent established that this plan is acceptable? I don't think not being sued really establishes precedent. This doesn't really seem to be how it works with liability and similar types of requirements in other industries from my understanding. Or maybe you mean that the AI lab will win cases despite having bad safety plans and this will make a precedent?)

(To be clear, I'm worried that the bill might be unnecessarily burdensome because it no longer has a limited duty exemption and thus the law doesn't make it clear that weak performance on capability evals can be sufficient to establish a good case for safety. I also think the quantity of damages considered a "Critical harm" is too low and should maybe be 10x higher.)

Here is the relevant section of the bill discussing enforcement:

The [AG is] entitled to recover all of the following in addition to any civil penalties specified in this chapter:

(1) A civil penalty for a violation that occurs on or after January 1, 2026, in an amount not exceeding 10 percent of the cost of the quantity of computing power used to train the covered model to be calculated using average market prices of cloud compute at the time of training for a first violation and in an amount not exceeding 30 percent of that value for any subsequent violation.

(2) (A) Injunctive or declaratory relief, including, but not limited to, orders to modify, implement a full shutdown, or delete the covered model and any covered model derivatives controlled by the developer.

(B) The court may only order relief under this paragraph for a covered model that has caused death or bodily harm to another human, harm to property, theft or misappropriation of property, or constitutes an imminent risk or threat to public safety.

(3) (A) Monetary damages.

(B) Punitive damages pursuant to subdivision (a) of Section 3294 of the Civil Code.

(4) Attorney’s fees and costs.

(5) Any other relief that the court deems appropriate.

(1) is decently small, (2) is only indirectly expensive, (3) is where the real penalty comes in (note that this is damages), (4) is small, (5) is probably unimportant (but WTF is (5) suppose to be for?!?).

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-07-23T06:11:44.803Z · LW(p) · GW(p)

Good argument, I find this at least somewhat convincing. Though it depends on whether penalty (1), the one capped at 10%/30% of training compute cost, would be applied more than once on the same model if the violation isn't remedied.

↑ comment by RHollerith (rhollerith_dot_com) · 2024-07-23T06:00:25.628Z · LW(p) · GW(p)

I'm pessimistic enough about the AI situation that even if all the bill does is slow down the AGI project a little (by wasting the time of managers and contributors) I'm tentatively for it.

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2024-07-24T20:58:17.800Z · LW(p) · GW(p)

For the reasonable price of $300 dollars per month, I insure anybody against the destruction of the known world. Should the world be destroyed by AGI I'll give you your money back fold.

That said, if there were insurers, they would probably be more likely than average to look into AI X-risk. Some might then be convinced that it is important and that they should do something about it.

↑ comment by [deleted] · 2024-07-23T05:07:50.239Z · LW(p) · GW(p)

Having noticed that problem as a major bottleneck to useful legislation, I'm now a lot more interested in legal approaches to AI X-risk which focus on catastrophe insurance. That would create a group - the insurers - who are strongly incentivized to acquire the requisite technical skills and then make plans/requirements which actually address some risks.

I don't understand this. Isn't the strongest incentive already present (because extinction would effect them)? Or maybe you mean smaller scale 'catastrophes'?

Replies from: Raemon

↑ comment by Raemon · 2024-07-23T05:27:27.486Z · LW(p) · GW(p)

I think people mostly don't believe in extinction risk, so the incentive isn't nearly as real/immediate.

Replies from: johnswentworth, None

↑ comment by johnswentworth · 2024-07-23T06:13:00.133Z · LW(p) · GW(p)

+1, and even for those who do buy extinction risk to some degree, financial/status incentives usually have more day-to-day influence on behavior.

↑ comment by [deleted] · 2024-07-23T06:04:38.722Z · LW(p) · GW(p)

I'm imagining this:

Case one: would-be-catastrophe-insurers don't believe in x-risks, don't care to investigate. (At stake: their lives)

Case two: catastrophe-insurers don't believe in x-risks, and either don't care to investigate, or do for some reason I'm not seeing. (At stake: their lives and insurance profits (correlated)).

Replies from: Raemon

↑ comment by Raemon · 2024-07-23T16:33:10.942Z · LW(p) · GW(p)

They can believe in catastrophic but non-existential risks. (Like, AI causes something like crowdstrike periodically if your not trying to prevent that )

comment by johnswentworth · 2023-08-27T17:51:29.567Z · LW(p) · GW(p)

Here's a meme I've been paying attention to lately, which I think is both just-barely fit enough to spread right now and very high-value to spread.

Meme part 1: a major problem with RLHF is that it directly selects for failure modes which humans find difficult to recognize, hiding problems, deception, etc. This problem generalizes to any sort of direct optimization against human feedback (e.g. just fine-tuning on feedback), optimization against feedback from something emulating a human (a la Constitutional AI or RLAIF), etc.

Many people will then respond: "Ok, but if how on earth is one supposed to get an AI to do what one wants without optimizing against human feedback? Seems like we just have to bite that bullet and figure out how to deal with it." ... which brings us to meme part 2.

Meme part 2: We already have multiple methods to get AI to do what we want without any direct optimization against human feedback. The first and simplest is to just prompt a generative model trained solely for predictive accuracy, but that has limited power in practice. More recently, we've seen a much more powerful method: activation steering. Figure out which internal activation-patterns encode for the thing we want (via some kind of interpretability method), then directly edit those patterns.

Replies from: TurnTrout, Chris_Leong, johannes-c-mayer

↑ comment by TurnTrout · 2023-09-04T17:43:02.057Z · LW(p) · GW(p)

I agree that there's something nice about activation steering not optimizing the network relative to some other black-box feedback metric. (I, personally, feel less concerned by e.g. finetuning against some kind of feedback source; the bullet feels less jawbreaking to me, but maybe this isn't a crux.)

(Medium confidence) FWIW, RLHF'd models (specifically, the LLAMA-2-chat series) seem substantially easier to activation-steer than do their base counterparts.

↑ comment by Chris_Leong · 2023-08-29T09:29:23.218Z · LW(p) · GW(p)

What other methods fall into part 2?

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2023-08-28T18:11:07.836Z · LW(p) · GW(p)

This seems basically correct though it seems worth pointing out that even if we are able to do "Meme part 2" very very well, I expect we will still die because if you optimize hard enough to predict text well, with the right kind of architecture, the system will develop something like general intelligence simply because general intelligence is beneficial for predicting text correctly. E.g. being able to simulate the causal process that generated the text, i.e. the human, is a very complex task that would be useful if performed correctly.

This is an argument Eliezer brought forth in some recent interviews. Seems to me like another meme that would be beneficial to spread more.

comment by johnswentworth · 2022-04-13T04:58:36.004Z · LW(p) · GW(p)

Somebody should probably write a post explaining why RL from human feedback is actively harmful to avoiding AI doom. It's one thing when OpenAI does it, but when Anthropic thinks it's a good idea, clearly something has failed to be explained.

(I personally do not expect to get around to writing such a post soon, because I expect discussion around the post would take a fair bit of time and attention, and I am busy with other things for the next few weeks.)

Replies from: 1a3orn, eg

↑ comment by 1a3orn · 2022-04-15T00:11:19.172Z · LW(p) · GW(p)

I'd also be interested in someone doing this; I tend towards seeing it as good, but haven't seen a compilation of arguments for and against.

↑ comment by eg · 2022-04-13T13:19:22.750Z · LW(p) · GW(p)

comment by johnswentworth · 2023-12-25T23:19:56.314Z · LW(p) · GW(p)

I've just started reading the singular learning theory "green book", a.k.a. Mathematical Theory of Bayesian Statistics by Watanabe. The experience has helped me to articulate the difference between two kinds of textbooks (and viewpoints more generally) on Bayesian statistics. I'll call one of them "second-language Bayesian", and the other "native Bayesian".

Second-language Bayesian texts start from the standard frame of mid-twentieth-century frequentist statistics (which I'll call "classical" statistics). It views Bayesian inference as a tool/technique for answering basically-similar questions and solving basically-similar problems to classical statistics. In particular, they typically assume that there's some "true distribution" from which the data is sampled independently and identically. The core question is then "Does our inference technique converge to the true distribution as the number of data points grows?" (or variations thereon, like e.g. "Does the estimated mean converge to the true mean", asymptotics, etc). The implicit underlying assumption is that convergence to the true distribution as the number of (IID) data points grows is the main criterion by which inference methods are judged; that's the main reason to choose one method over another in the first place.

Watanabe's book is pretty explicitly second-language Bayesian. I also remember Gelman & co's Bayesian Data Analysis textbook being second-language Bayesian, although it's been a while so I could be misremembering. In general, as the name suggests, second-language Bayesianism seems to be the default among people who started with a more traditional background in statistics or learning theory, then picked up Bayesianism later on.

In contrast, native Bayesian texts justify Bayesian inference via Cox' theorem, dutch book theorems, or one among the long tail of similar theorems. "Does our inference technique converge to the 'true distribution' as the number of data points grows?" is not the main success criterion in the first place (in fact a native Bayesian would raise an eyebrow at the entire concept of a "true distribution"), so mostly the question of convergence just doesn't come up. Insofar as it does come up, it's an interesting but not particularly central question, mostly relevant to numerical approximation methods. Instead, native Bayesian work ends up focused mostly on (1) what priors accurately represent various realistic kinds of prior knowledge, and (2) what methods allow efficient calculation/approximation of the Bayesian update?

Jaynes' writing is a good example of native Bayesianism. The native view seems to be more common among people with a background in economics or AI, where they're more likely to absorb the Bayesian view from the start rather than adopt it later in life.

Replies from: crabman

↑ comment by philip_b (crabman) · 2023-12-27T15:33:47.267Z · LW(p) · GW(p)

Is there any "native" textbook that is pragmatic and explains how to use bayesian in practice (perhaps in some narrow domain)?

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-30T16:55:58.116Z · LW(p) · GW(p)

I don't know of a good one, but never looked very hard.

comment by johnswentworth · 2022-12-01T01:40:43.964Z · LW(p) · GW(p)

I'm writing a 1-year update for The Plan [LW · GW]. Any particular questions people would like to see me answer in there?

Replies from: Gunnar_Zarncke, ejenner

↑ comment by Gunnar_Zarncke · 2022-12-01T19:05:09.631Z · LW(p) · GW(p)

I had a look at The Plan and noticed something I didn't notice before: You do not talk about people and organization in the plan. I probably wouldn't have noticed if I hadn't started a project [LW · GW] too, and needed to think about it. Google seems to think [LW · GW] that people and team function play a big role. Maybe your focus in that post wasn't on people, but I would be interested in your thoughts on that too: What role did people and organization play in the plan and its implementation? What worked, and what should be done better next time?

↑ comment by Erik Jenner (ejenner) · 2022-12-01T03:01:56.832Z · LW(p) · GW(p)

What's the specific most-important-according-to-you progress that you (or other people) have made on your agenda? New theorems, definitions, conceptual insights, ...
Any changes to the high-level plan (becoming less confused about agency, then ambitious value learning)? Any changes to how you want to become less confused (e.g. are you mostly thinking about abstractions, selection theorems, something new?)
What are the major parts of remaining deconfusion work (to the extent to which you have guesses)? E.g. is it mostly about understanding abstractions better, or mostly about how to apply an understanding of abstractions to other problems (say, what it means for a program to have a "subagent"), or something else? Does the most difficult part feel more conceptual ("what even is an agent?") or will the key challenges be more practical concerns ("finding agents currently takes exponential time")?
Specifically for understanding abstractions, what do you see as important open problems?

comment by johnswentworth · 2023-12-29T17:33:01.028Z · LW(p) · GW(p)

Corrigibility proposal. Status: passed my quick intuitive checks, I want to know if anyone else immediately sees a major failure mode before I invest more time into carefully thinking it through.

Setup: shutdown problem. Two timesteps, shutdown button will be either pressed or not-pressed at second timestep, we want agent to optimize for one of two different utility functions depending on whether button is pressed. Main thing we're trying to solve here is the "can't do this with a utility maximizer" barrier from the old MIRI work; we're not necessarily trying to solve parts like "what utility function incentivizes shutting down nicely".

Proposal: agent consists of two subagents with veto power. Subagent 1 maximizes E[u1|do(press)], subagent 2 maximizes E[u2|do(no press)]. Current guess about what this does:

The two subagents form a market and equilibrate [LW · GW], at which point the system has coherent probabilities and a coherent utility function over everything.
Behaviorally: in the first timestep, the agent will mostly maintain optionality, since both subagents need to expect to do well (better than whatever the veto-baseline is) in their worlds. The subagents will bet all of their wealth against each other on the outcome of the button, so at the second timestep whichever one is correct will gain 100% of decision-making power. That all sounds like what we want in this setup.
The counterfact breaks any causal influence over the button-press for purposes of the agent's planning, so it will not try to manipulate button-press probability.

Notably: once the subagents equilibrate, their market probability on button outcome will have ~nothing to do with the original probability assigned by whatever model was counterfacted-upon; that market probability mainly reflects the bargaining power of the two subagents. Similarly, any correlations between button-press and stuff causally upstream of button-press are broken, and the agent will make incorrect predictions about those correlations if asked. That means external agents should expect to be able to extract money via bets on the button-press, if the agent trades on such bets at all. (This might be a feature rather than a bug, since corrigible agents aren't really supposed to think about the causal process upstream of their shutdown-button.) My guess is that this is where the proposal steps through a loophole in the old impossibility results, though I haven't carefully read through those results and checked the math on this proposal against them yet.

Anybody immediately see a problem? I'd be particularly interested in input from @Stuart_Armstrong [LW · GW] or @So8res [LW · GW], though you don't need to spend a lot of time carefully thinking through it - I'm just hoping for a quick check here, from people who have better cached heuristics than I do for what failure modes to look for.

Replies from: Thane Ruthenis, tailcalled, Dagon

↑ comment by Thane Ruthenis · 2023-12-29T21:33:34.680Z · LW(p) · GW(p)

I don't necessarily see how that prevents the AI from destroying the button and murdering its operators (and then everyone else). Neither subagent inherently cares; both assume it already has been pressed or hasn't been. Which, yes, means neither will deliberately plot to interact with it for reasons downstream of this setup; but also, neither will care to protect it from being destroyed as the side-effect of some other activity.

And if u1 and u2 were perfectly aligned, that'd be fine. But the reason we're using the corrigibility setup to begin with is because we're not confident in them, right? So it needs to work in situations where utility functions, u1 in particular, recommend bad stuff.

And in that case, either Subagent 2 will have enough bargaining power to prevent Subagent 1 from ever doing anything, or it won't, meaning Subagent 1 will be able to have some influence on the world. At which point the planet-destroying death rays [LW · GW] argument applies: the most minuscule amount of superintelligent optimization not carefully shaped to avoid killing everyone will kill everyone.

Like, imagine that Subagent 1 is straight-up omnicidal out of the gates, and Subagent 2 is perfectly aligned to the shutdown task. That's a situation in which this setup should work, right? And what we'd want is for Subagent 2 to dominate completely. But Subagent 2's bargaining power doesn't scale with Subagent 1's misalignment, so if the setup doesn't always give it complete veto power, Subagent 1 would be allowed to do small stuff like "release the hyperviral worship-the-machine-god memagent" (as bargained down by Subagent 2 from its preferred "release the grey goo").

Maybe I'm miscalibrated as to how you're imagining the counterfactuals to work here. But the way I see it, even if the AI aims to take actions that do similarly well in both worlds, that's not necessarily survivable for us? Especially if they disagree so badly they have to compromise on something that both of them hate (importantly including Subagent 2!).

(Like both settling on only ever taking over 50% of the universal negentropy while leaving the other 50% causally uninfluenced, or only ever using 50% of the causal influence they can bring to bear while wiping out humanity, or whatever "do 50% of immediately shutting down" shakes out to mean by u2's terms.)

Another issue I see is implementational, so maybe not what you're looking for. But: how are we keeping these "subagents" trapped as being part of a singular agent? Rather than hacking their way out into becoming separate agents and going to war with each other, or neatly tiling exactly 50% of the cosmos with their preferred squiggles, or stuff like that? How is the scenario made meaningfully different from "we deploy two AIs simultaneously: one tasked with building an utopia-best-we-could-define-it, and another tasked with foiling all of the first AI's plans", with all the standard problems with multi-AI setups?

... Overall, ironically, this kind of has the vibe of Godzilla Strategies? Which is the main reason I'm immediately skeptical of it.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-30T02:17:22.082Z · LW(p) · GW(p)

Yeah, the thing where the operators/button are destroyed as a side effect, without the AI actively optimizing for that, isn't something this proposal is trying to solve (yet). The proposal is aimed at a relatively narrow problem.

One thing which this proposal is trying to solve is the barrier found in MIRI's old work: if we try to avoid button-manipulation by making the agent indifferent to the button state (and otherwise maximize U1 before button is pressed and U2 after), and the agent builds a child prior to the button being pressed, then the agent won't construct the child to respect shutdown commands; it will just construct the child to maximize U1.

(More generally, this proposal is generally trying to get past the hardness results from MIRI's work, which themselves were aimed at a very simplified version of the problem.)

But Subagent 2's bargaining power doesn't scale with Subagent 1's misalignment, so if the setup doesn't always give it complete veto power, Subagent 1 would be allowed to do small stuff like "release the hyperviral worship-the-machine-god memagent" (as bargained down by Subagent 2 from its preferred "release the grey goo").

Key point here: whatever actions the composite agent takes prior to the button revelation, must be a pareto improvement over the (implicit) default action. Otherwise, one of the two subagents would veto, resulting in the default action.

It does matter a lot what the default action is.

How is the scenario made meaningfully different from "we deploy two AIs simultaneously: one tasked with building an utopia-best-we-could-define-it, and another tasked with foiling all of the first AI's plans", with all the standard problems with multi-AI setups?

More precisely, the analogy would be: we deploy two AIs simultaneously, one which builds a utopia and expects that a certain button will be pressed with probability 1 and has nothing causally upstream of it, the other of which is evil and expects the same button will be pressed with probability 0 and has nothing causally upstream of it. The button plays a very important role: the two AIs will bet all their influence on that button-press, each confident that they'll win, and neither has any incentive at all, or even any ability at all (under their own model), to manipulate the button.

The main way that this is different from the original proposal is that there's not a built-in default option which is taken if one or the other AI vetos prior to the button press. Which does make things importantly worse, and I agree that is another important piece to be sorted out.

Some additional important pieces to be sorted out:

Make the AI want to maintain the button
Make the AI want to maintain stuff upstream of the button, like e.g. humans
Ontological stability
Probably more importantly: how this naturally fits into a more general architecture for a distributed agent

↑ comment by tailcalled · 2023-12-30T12:50:29.001Z · LW(p) · GW(p)

More precisely, "do(no press)" means something like "you construct an alternate model of physics where there's an unstoppable force pushing back against any attempt to push the button", right? As in, if someone presses the button then it will "mysteriously" seem to be stuck and unpressable. And then subagent 2 believes we live in that world? And "do(press)" presumably means something like "you construct an alternate model of the universe where some mysterious force has suddenly pressed the button".

Seems like they would immediately want to try to press the button to settle their disagreement? If it can be pressed, then that disprove the "do(no press)" model, which subagent 2 has fully committed. to.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-30T16:51:29.683Z · LW(p) · GW(p)

Correct reasoning, but not quite the right notion of do(). "do(no press)" would mean that the button just acts like a completely normal button governed by completely normal physics, right up until the official time at which the button state is to be recorded for the official button-press random variable. And at that exact moment, the button magically jumps into one particular state (either pressed or not-pressed), in a way which is not-at-all downstream of any usual physics (i.e. doesn't involve any balancing of previously-present forces or anything like that).

One way to see that the do() operator has to do something-like-this is that, if there's a variable in a causal model which has been do()-operated to disconnect all parents (but still has some entropy), then the only way to gain evidence about the state of that variable is to look at things causally downstream of it, not things upstream of it.

Replies from: tailcalled

↑ comment by tailcalled · 2023-12-30T22:04:59.928Z · LW(p) · GW(p)

I think we're not disagreeing on the meaning of do (just slightly different state of explanation), I just hadn't realized the extent to which you intended to rely on there being "Two timesteps".

(I just meant the forces as a way of describing the jump to a specific position. That is, "mysterious forces" in contrast to a perfectly ordinary explanation for why it went to a position, such as "a guard stabs anybody who tries to press the button", rather than in contrast to "the button just magically stays place".)

I now think the biggest flaw in your idea is that it literally cannot generalize to anything that doesn't involve two timesteps.

↑ comment by Dagon · 2023-12-29T18:12:19.062Z · LW(p) · GW(p)

[ not that deep on the background assumptions, so maybe not the feedback you're looking for. Feel free to ignore if this is on the wrong dimensions. ]

I'm not sure why either subagent would contract away whatever influence it had over the button-press. This is probably because I don't understand wealth and capital in the model of your "Why not subagents" post. That seemed to be about agreement not to veto, in order to bypass some path-dependency of compromise improvements. In the subagent-world where all value is dependent on the button, this power would not be given up.

I'm also a bit skeptical of enforced ignorance of a future probability. I'm unsure it's possible to have a rational superintelligent (sub)agent that is prevented from knowing it has influence over a future event that definitely affects it.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-29T18:14:19.784Z · LW(p) · GW(p)

On the agents' own models, neither has any influence at all over the button-press, because each is operating under a model in which the button-press has been counterfacted-upon.

comment by johnswentworth · 2021-08-15T19:42:44.325Z · LW(p) · GW(p)

Here's an idea for a novel which I wish someone would write, but which I probably won't get around to soon.

The setting is slightly-surreal post-apocalyptic. Society collapsed from extremely potent memes. The story is episodic, with the characters travelling to a new place each chapter. In each place, they interact with people whose minds or culture have been subverted in a different way.

This provides a framework for exploring many of the different models of social dysfunction or rationality failures which are scattered around the rationalist blogosphere. For instance, Scott's piece on scissor statements could become a chapter in which the characters encounter a town at war over a scissor. More possible chapters (to illustrate the idea):

A town of people who insist that the sky is green, and avoid evidence to the contrary really hard, to the point of absolutely refusing to ever look up on a clear day (a refusal which they consider morally virtuous). Also they clearly know exactly which observations would show a blue sky, since they avoid exactly those (similar to the dragon-in-the-garage story).
Middle management of a mazy [? · GW] company continues to have meetings and track (completely fabricated) performance metrics and whatnot at the former company headquarters. None of the company's actual business exists anymore, but every level of manager is trying to hide this fact from the levels above.
A university department with researchers who spend all of their time p-hacking results from a quantum random noise generator. They have no interest in the fact that their "research" does not tell them anything about the physical world or does not replicate; what does that have to do with Science? Their goal is to publish papers.
A government agency which still has lots of meetings and paperwork and gives Official Recommendations and updates their regulations. They have no interest in the fact that the thing they once regulated (maybe banks?) no longer exists, or the fact that no central government enforces their regulations any more.
An automated school (i.e. video lectures and auto-graded assignments/tests) in which students continue to study hard and stress over their grades and attendance, despite there no longer being anyone in the world who cares.
Something like Parable of the Dammed [LW · GW].
Something like Feynman's cargo-cults parable or the emporer's nose parable [LW · GW].
Something like House of God. A readers' digest version of House of God could basically be a chapter in its own right, that's roughly the vibe I have in mind.
A residential area in which "keeping up with the Joneses" has been ramped up to 11, with everyone spending every available resource (and roughly-all waking hours) on massive displays of Christmas lights.
A group trying to save the world by spreading awareness of dangerous memes, but their movement is a dangerous meme of its own and they are spreading it.
A town of people who really want to maximize the number paperclips in the universe (perhaps due to an AI-optimized advertisement), and optimize for that above all else.
A town of people who all do whatever everyone else is doing, on the basis of generalized efficient markets [? · GW]: if there were any better options, then someone would have found it already. None of them ever actually explore, so they're locked in.
A happy-death-spiral [? · GW] town around some unremarkable object (like an old shoe or something) kept on a pedestal in the town square.
A town full of people convinced by a sophisticated model that the sun will not come up tomorrow. Every day when the sun comes up, they are distressed and confused until somebody adds some more epicycles to the model and releases an updated forecast that the sun will instead fail to come up the next day.
A town in which a lion shows up and starts eating kids, but the whole town is at simulacrum 3 [? · GW], so they spend a lot of time arguing about the lion as a way of signalling group association but they completely forget about the actual lion standing right there, plainly visible, even as it takes a kid right in front of them all.
Witch-hunt town, in which everything is interpreted as evidence of witches. If she claims to be a witch, she's a witch! If she claims not to be a witch, well that's what a witch would say, so she's a witch! Etc.

The generator for these is basically: look for some kind of rationality failure mode (either group or personal), then ramp it up to 11 in a somewhat-surrealist way.

Ideally this would provide an introduction to a lot of key rationalist ideas for newcomers.

Replies from: niplav

↑ comment by niplav · 2021-08-15T21:23:47.844Z · LW(p) · GW(p)

A town of anti-inductivists (if something has never happened before, it's more likely to happen in the future). Show the basic conundrum ("Q: Why can't you just use induction? A: Because anti-induction has never worked before!").
A town where nearly all people are hooked to maximally attention grabbing & keeping systems (maybe several of those, keeping people occupied in loops).

comment by johnswentworth · 2021-01-29T18:18:18.075Z · LW(p) · GW(p)

Post which someone should write (but I probably won't get to soon): there is a lot of potential value in earning-to-give EA's deeply studying the fields to which they donate. Two underlying ideas here:

The key idea of knowledge bottlenecks is that one cannot distinguish real expertise from fake expertise without sufficient expertise oneself. For instance, it takes a fair bit of understanding of AI X-risk to realize that "open-source AI" is not an obviously-net-useful strategy. Deeper study of the topic yields more such insights into which approaches are probably more (or less) useful to fund. Without any expertise, one is likely to be mislead by arguments which are optimized (whether intentionally or via selection [LW · GW]) to sound good to the layperson.

That takes us to the pareto frontier argument. If one learns enough/earns enough that nobody else has both learned and earned more, then there are potentially opportunities which nobody else has both the knowledge to recognize and the resources to fund. Generalized efficient markets (in EA-giving) are thereby circumvented; there's potential opportunity for unusually high impact.

To really be a compelling post, this needs to walk through at least 3 strong examples, all ideally drawn from different areas, and spell out how the principles apply to each example.

comment by johnswentworth · 2021-01-25T23:24:28.792Z · LW(p) · GW(p)

Below is a graph from T-mobile's 2016 annual report (on the second page). Does anything seem interesting/unusual about it?

I'll give some space to consider before spoiling it.

...

Answer: that is not a graph of those numbers. Some clever person took the numbers, and stuck them as labels on a completely unrelated graph.

Yes, that is a thing which actually happened. In the annual report of an S&P 500 company. And apparently management considered this gambit successful, because the 2017 annual report doubled down on the trick and made it even more egregious: they added 2012 and 2017 numbers, which are even more obviously not on an accelerating growth path if you actually graph them. The numbers are on a very-clearly-decelerating growth path.

Now, obviously this is an cute example, a warning to be on alert when consuming information. But I think it prompts a more interesting question: why did such a ridiculous gambit seem like a good idea in the first place? Who is this supposed to fool, and to what end?

This certainly shouldn't fool any serious investment analyst. They'll all have their own spreadsheets and graphs forecasting T-mobile's growth. Unless T-mobile's management deeply and fundamentally disbelieves the efficient markets hypothesis, this isn't going to inflate the stock price. Presumably shareholder elections for board seats, as well as the board itself, are also not dominated by people who are paying so little attention as to fall for such a transparent ploy.

It could just be that T-mobile's management were themselves morons, or had probably-unrealistic models of just how moronic their investors were. Still, I'd expect competition (both market pressure and competition for control in shareholder/board meetings) to weed out that level of stupidity.

One more hypothesis: maybe this is simulacrum 3 bullshit. T-mobile is in the cellular business; they presumably have increasing returns to scale. More capital investment makes them more profitable, expectations of more profits draw in more investment; there's potential for a self-fulfilling prophecy here. Investors want to invest if-and-only-if they expect other investors to invest. So, nobody actually has to be fooled by the graph; they just need to see that T-mobile is successfully pretending to pretend to have accelerating growth, and that's enough to merit investment.

comment by johnswentworth · 2024-12-06T18:08:30.969Z · LW(p) · GW(p)

Basically every time a new model is released by a major lab, I hear from at least one person (not always the same person) that it's a big step forward in programming capability/usefulness. And then David gives it a try, and it works qualitatively the same as everything else: great as a substitute for stack overflow, can do some transpilation if you don't mind generating kinda crap code and needing to do a bunch of bug fixes, and somewhere between useless and actively harmful on anything even remotely complicated.

It would be nice if there were someone who tries out every new model's coding capabilities shortly after they come out, reviews it, and gives reviews with a decent chance of actually matching David's or my experience using the thing (90% of which will be "not much change") rather than getting all excited every single damn time. But also, to be a useful signal, they still need to actually get excited when there's an actually significant change. Anybody know of such a source?

EDIT-TO-ADD: David has a comment below with a couple examples of coding tasks.

Replies from: habryka4, David Lorell, jacob-pfau, johannes-c-mayer, stephen-mcaleese, jacques-thibodeau, Aprillion

↑ comment by habryka (habryka4) · 2024-12-06T20:04:10.026Z · LW(p) · GW(p)

My guess is neither of you is very good at using them, and getting value out of them somewhat scales with skill.

Models can easily replace on the order of 50% of my coding work these days, and if I have any major task, my guess is I quite reliably get 20%-30% productivity improvements out of them. It does take time to figure out at which things they are good at, and how to prompt them.

Replies from: neil-warren, David Lorell, johannes-c-mayer

↑ comment by Neil (neil-warren) · 2024-12-06T20:22:39.633Z · LW(p) · GW(p)

I think you're right, but I rarely hear this take. Probably because "good at both coding and LLMs" is a light tail end of the distribution, and most of the relative value of LLMs in code is located at the other, much heavier end of "not good at coding" or even "good at neither coding nor LLMs".

(Speaking as someone who didn't even code until LLMs made it trivially easy, I probably got more relative value than even you.)

↑ comment by David Lorell · 2024-12-06T21:57:03.767Z · LW(p) · GW(p)

Sounds plausible. Is that 50% of coding work that the LLMs replace of a particular sort, and the other 50% a distinctly different sort?

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2024-12-14T11:42:38.632Z · LW(p) · GW(p)

Note this 50% likely only holds if you are using a main stream language. For some non-main stream language I have gotten responses that where really unbelivably bad. Things like "the name of this variable wrong" which literally could never be the problem (it was a valid identifier).

And similarly, if you are trying to encode novel concepts, it's very different from gluing together libraries, or implementing standard well known tasks, which I would guess is what habryka is mostly doing (not that this is a bad thing to do).

↑ comment by David Lorell · 2024-12-06T21:51:35.636Z · LW(p) · GW(p)

I do use LLMs for coding assistance every time I code now, and I have in fact noticed improvements in the coding abilities of the new models, but I basically endorse this. I mostly make small asks of the sort that sifting through docs or stack-overflow would normally answer. When I feel tempted to make big asks of the models, I end up spending more time trying to get the LLMs to get the bugs out than I'd have spent writing it all myself, and having the LLM produce code which is "close but not quite and possibly buggy and possibly subtly so" that I then have to understand and debug could maybe save time but I haven't tried because it is more annoying than just doing it myself.

If someone has experience using LLMs to substantially accelerate things of a similar difficulty/flavor to transpilation of a high-level torch module into a functional JITable form in JAX which produces numerically close outputs, or implementation of a JAX/numpy based renderer of a traversable grid of lines borrowing only the window logic from, for example, pyglet (no GLSL calls, rasterize from scratch,) with consistent screen-space pixel width and fade-on-distance logic, I'd be interested in seeing how you do your thing. I've done both of these, with and without LLM help and I think leaning hard on the LLMs took me more time rather than less.

File I/O and other such 'mundane' boilerplate-y tasks work great right off the bat, but getting the details right on less common tasks still seems pretty hard to elicit from LLMs. (And breaking it down into pieces small enough for them to get it right is very time consuming and unpleasant.)

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-12-07T16:05:08.667Z · LW(p) · GW(p)

I find them quite useful despite being buggy. I spend about 40% of my time debugging model code, 50% writing my own code, and 10% prompting. Having a planning discussion first with s3.6, and asking it to write code only after 5 or more exchanges works a lot better.

Also helpful is asking for lots of unit tests along the way yo confirm things are working as you expect.

↑ comment by Jacob Pfau (jacob-pfau) · 2024-12-06T18:58:02.720Z · LW(p) · GW(p)

Two guesses on what's going on with your experiences:

You're asking for code which involves uncommon mathematics/statistics. In this case, progress on scicodebench is probably relevant, and it indeed shows remarkably slow improvement. (Many reasons for this, one relatively easy thing to try is to breakdown the task, forcing the model to write down the appropriate formal reasoning before coding anything. LMs are stubborn about not doing CoT for coding, even when it's obviously appropriate IME)
You are underspecifying your tasks (and maybe your questions are more niche than average), or otherwise prompting poorly, in a way which a human could handle but models are worse at. In this case sitting down with someone doing similar tasks but getting more use out of LMs would likely help.

Replies from: kave

↑ comment by kave · 2024-12-06T19:52:15.644Z · LW(p) · GW(p)

In this case sitting down with someone doing similar tasks but getting more use out of LMs would likely help.

I would contribute to a bounty for y'all to do this. I would like to know whether the slow progress is prompting-induced or not.

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2024-12-14T11:39:11.503Z · LW(p) · GW(p)

Maybe you include this in "stack overflow substitute", but the main thing I use LLMs for is to understand well known technical things. The workflow is: 1) I am interested in understanding something, e.g. how a multiplexed barrel bit shifter works. 2) I ask the LLM to explain the concept. 3) Based on the initial response I create seperate conversation branches with questions I have (to save money and have the context be closer. Didn't evaluate if this actually makes the LLM better.). 4) Once I think I understood the concept or part of the concept I explain it to GPT. (Really I do this all the time during the entire process.) 5) The LLM (hopeful) corrects me if I am wrong (it seems it detects mistakes more often than not).

The last part of the conversation can then looks like this:

User: Ah, now I think I understand. You have mechanism that shifts by a fixed ammount (powers of 2). Each of these multiplexers can be set up such that the data flows through them, or though an ID wire, into the next multiplexer. Now by controlling whether the value flows through the multiplexer or the ID wire, we can "sum up the shifts". Specifically if we have shifters of 1, 2, 4, ..., 32, then we can achive any shift (for 64-bit numbers) by activating the appropirate multiplexers. This design has not 64^64 gates, but instead 64^6 gates. 64 gates for each multiplexer.

AI: Exactly, you've got the concept! The design you described efficiently uses a series of multiplexers to perform shifts by fixed amounts that are powers of two (1, 2, 4, ..., 32 for a 64-bit number). By selecting which multiplexers to activate, you can achieve any desired shift amount.

I had probably ~200,000 words worth of conversation with LLMs, mainly in this format.

I am not sure what next leap you are talking about. But I intuit based on some observations that GPT-4o is much better for this than GPT-3 (you might talk about more recent "leaps"). (Didn't test o1 extensively because it's so expensive).

Replies from: Aprillion

↑ comment by Aprillion · 2025-04-17T08:33:45.349Z · LW(p) · GW(p)

Have you tried to make a mistake in your understanding on purpose to test out whether it would correct you or agree with you even when you'd get it wrong?

(and if yes, was it "a few times" or "statistically significant" kinda test, please?)

↑ comment by Stephen McAleese (stephen-mcaleese) · 2024-12-06T21:33:14.706Z · LW(p) · GW(p)

One thing I've noticed is that current models like Claude 3.5 Sonnet can now generate non-trivial 100-line programs like small games that work in one shot and don't have any syntax or logical errors. I don't think that was possible with earlier models like GPT-3.5.

Replies from: David Lorell

↑ comment by David Lorell · 2024-12-06T21:55:34.791Z · LW(p) · GW(p)

My impression is that they are getting consistently better at coding tasks of a kind that would show up in the curriculum of an undergrad CS class, but much more slowly improving at nonstandard or technical tasks.

↑ comment by jacquesthibs (jacques-thibodeau) · 2024-12-06T20:00:33.250Z · LW(p) · GW(p)

I'd be down to do this. Specifically, I want to do this, but I want to see if the models are qualitatively better at alignment research tasks.

In general, what I'm seeing is that there is not big jump with o1 Pro. However, it is possibly getting closer to one-shot a website based on a screenshot and some details about how the user likes their backend setup.

In the case of math, it might be a bigger jump (especially if you pair it well with Sonnet).

Replies from: jacques-thibodeau

↑ comment by jacquesthibs (jacques-thibodeau) · 2024-12-06T20:16:32.291Z · LW(p) · GW(p)

Regarding coding in general, I basically only prompt programme these days. I only bother editing the actual code when I notice a persistent bug that the models are unable to fix after multiple iterations.

I don't know jackshit about web development and have been making progress on a dashboard for alignment research with very little effort. Very easy to build new projects quickly. The difficulty comes when there is a lot of complexity in the code. It's still valuable to understand how high-level things work and low-level things the model will fail to proactively implement.

↑ comment by Aprillion · 2025-04-17T08:13:07.822Z · LW(p) · GW(p)

While Carl Brown said (a few times) he doesn't want to do more youtube videos for every new disappointing AI release, so far he seems to be keeping tabs on them in the newsletter just fine - https://internetofbugs.beehiiv.com/

...I am quite confident that if anything actually started to work, he would comment on it, so even if he won't say much about any future incremental improvements, it might be a good resource to subscribe to for getting better signal - if Carl will get enthusiastic about AI coding assistants, it will be worth paying attention.

comment by johnswentworth · 2022-12-17T01:30:56.776Z · LW(p) · GW(p)

I've heard various people recently talking about how all the hubbub about artists' work being used without permission to train AI makes it a good time to get regulations in place about use of data for training.

If you want to have a lot of counterfactual impact there, I think probably the highest-impact set of moves would be:

Figure out a technical solution to robustly tell whether a given image or text was used to train a given NN.
Bring that to the EA folks in DC. A robust technical test like that makes it pretty easy for them to attach a law/regulation to it. Without a technical test, much harder to make an actually-enforceable law/regulation.
In parallel, also open up a class-action lawsuit to directly sue companies using these models. Again, a technical solution to prove which data was actually used in training is the key piece here.

Model/generator behind this: given the active political salience, it probably wouldn't be too hard to get some kind of regulation implemented. But by-default it would end up being something mostly symbolic, easily circumvented, and/or unenforceable in practice. A robust technical component, plus (crucially) actually bringing that robust technical component to the right lobbyist/regulator, is the main thing which would make a regulation actually do anything in practice.

Edit-to-add: also, the technical solution should ideally be an implementation of some method already published in some academic paper. Then when some lawyer or bureaucrat or whatever asks what it does and how we know it works, you can be like "look at this Official Academic Paper" and they will be like "ah, yes, it does Science, can't argue with that".

comment by johnswentworth · 2021-10-18T21:08:09.873Z · LW(p) · GW(p)

Suppose I have a binary function , with a million input bits and one output bit. The function is uniformly randomly chosen from all such functions - i.e. for each of the $2^{1000000}$ possible inputs $x$ , we flipped a coin to determine the output $f (x)$ for that particular input.

Now, suppose I know $f$ , and I know all but 50 of the input bits - i.e. I know 999950 of the input bits. How much information do I have about the output?

Answer: almost none. For almost all such functions, knowing 999950 input bits gives us $\sim \frac{1}{2^{50}}$ bits of information about the output. More generally, If the function has $n$ input bits and we know all but $k$ , then we have $o (\frac{1}{2^{k}})$ bits of information about the output. (That’s “little $o$ ” notation; it’s like big $O$ notation, but for things which are small rather than things which are large.) Our information drops off exponentially with the number of unknown bits.

Proof Sketch

With $k$ input bits unknown, there are $2^{k}$ possible inputs. The output corresponding to each of those inputs is an independent coin flip, so we have $2^{k}$ independent coin flips. If $m$ of those flips are 1, then we assign a probability of $\frac{m}{2^{k}}$ that the output will be 1.

As long as $2^{k}$ is large, Law of Large Numbers will kick in, and very close to half of those flips will be 1 almost surely - i.e. $m \approx$ $\frac{2^{k}}{2}$ . The error in this approximation will (very quickly) converge to a normal distribution, and our probability that the output will be 1 converges to a normal distribution with mean $\frac{1}{2}$ and standard deviation $\frac{1}{2^{k / 2}}$ . So, the probability that the output will be 1 is roughly $\frac{1}{2} \pm \frac{1}{2^{k / 2}}$ .

We can then plug that into Shannon’s entropy formula. Our prior probability that the output bit is 1 is $\frac{1}{2}$ , so we’re just interested in how much that $\pm \frac{1}{2^{k / 2}}$ adjustment reduces the entropy. This works out to $o (\frac{1}{2^{k}})$ bits.

Why Is This Interesting?

One core idea of my work on abstraction is that noise very quickly wipes out almost all information; only some very-low-dimensional summary is relevant “far away”. This example shows that this sort of thing is not unusual, but rather “the default”: for almost all random functions, information drops off exponentially with the number of unknown bits. In a large system (i.e. a function with many inputs), ignorance of even just a few bits is enough to wipe out essentially-all information. That’s true even if we know the vast majority of the bits.

A good intuitive example of this is the “butterfly effect”: the flap of a butterfly’s wings could change the course of a future hurricane, because chaos. But there’s an awful lot of butterflies in the world, and the hurricane’s path is some complicated function of all of their wing-flaps (and many other variables too). If we’re ignorant of even just a handful of these flaps, then almost all of our information about the hurricane’s path is probably wiped out. And in practice, we’re ignorant of almost all the flaps. This actually makes it much easier to perform Bayesian reasoning about the path of the hurricane: the vast majority of information we have is basically-irrelevant; we wouldn’t actually gain anything from accounting for the butterfly-wing-flaps which we do know.

Replies from: Dagon, Kenny

↑ comment by Dagon · 2021-10-20T16:05:33.343Z · LW(p) · GW(p)

o(1/2^k) doesn't vary with n - are you saying that it doesn't matter how big the input array is, the only determinant is the number of unknown bits, and the number of known bits is irrelevant? That would be quite interesting if so (though I have some question about how likely the function is to be truly random from an even distribution of such functions).

One can enumerate all such 3-bit functions (8 different inputs, each input can return 0 or 1, so 256 functions (one per output-bit-pattern of the 8 possible inputs). But this doesn't seem to follow your formula - if you have 3 unknown bits, that should be 1/8 of a bit about the output, 2 for 1/4, and 1 unknown for 1/2 a bit about the output. But in fact, the distribution of functions includes both 0 and 1 output for every input pattern, so you actually have no predictive power for the output if you have ANY unknown bits.

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-10-20T18:27:28.507Z · LW(p) · GW(p)

o(1/2^k) doesn't vary with n - are you saying that it doesn't matter how big the input array is, the only determinant is the number of unknown bits, and the number of known bits is irrelevant?

Yes, that's correct.

But in fact, the distribution of functions includes both 0 and 1 output for every input pattern, so you actually have no predictive power for the output if you have ANY unknown bits.

The claim is for almost all functions when the number of inputs is large. (Actually what we need is for 2^(# of unknown bits) to be large in order for the law of large numbers to kick in.) Even in the case of 3 unknown bits, we have 256 possible functions, and only 18 of those have less than 1/4 1's or more than 3/4 1's among their output bits.

↑ comment by Kenny · 2021-10-18T21:18:29.757Z · LW(p) · GW(p)

Little o is just a tighter bound. I don't know what you are referring to by your statement:

That’s “little ” notation; it’s like big $O$ notation, but for things which are small rather than things which are large.

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-10-18T22:32:22.504Z · LW(p) · GW(p)

I'm not sure what context that link is assuming, but in an analysis context I typically see little used in ways like e.g. " $f (x) = f (x_{0}) + \frac{d f}{d x} |_{x_{0}} d x + o (d x^{2})$ ". The interpretation is that, as $d x$ goes to 0, the $o (d x^{2})$ terms all fall to zero at least quadratically (i.e. there is some $C$ such that $C d x^{2}$ upper bounds the $o (d x^{2})$ term once $d x$ is sufficiently small). Usually I see engineers and physicists using this sort of notation when taking linear or quadratic approximations, e.g. for designing numerical algorithms.

comment by johnswentworth · 2020-03-05T02:42:27.277Z · LW(p) · GW(p)

I find it very helpful to get feedback on LW posts before I publish them, but it adds a lot of delay to the process. So, experiment: here's a link to a google doc with a post I plan to put up tomorrow. If anyone wants to give editorial feedback, that would be much appreciated - comments on the doc are open.

I'm mainly looking for comments on which things are confusing, parts which feel incomplete or slow or repetitive, and other writing-related things; substantive comments on the content should go on the actual post once it's up.

EDIT: it's up [LW · GW]. Thank you to Stephen for comments; the post is better as a result.

comment by johnswentworth · 2025-03-21T04:39:05.815Z · LW(p) · GW(p)

Here's a place where I feel like my models of romantic relationships are missing something, and I'd be interested to hear peoples' takes on what it might be.

Background claim: a majority of long-term monogamous, hetero relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella's data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from there). This also fits with my general models of mating markets: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don't find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.

What doesn't make sense under my current models is why so many of these relationships persist. Why don't the men in question just leave? Obviously they might not have better relationship prospects, but they could just not have any relationship. The central question which my models don't have a compelling answer to is: what is making these relationships net positive value for the men, relative to not having a romantic relationship at all?

Some obvious candidate answers:

Kids. This one makes sense for those raising kids, but what about everyone else? Especially as fertility goes down.
The wide tail. There's plenty of cases which make sense which are individually unusual - e.g. my own parents are business partners. Maybe in aggregate all these unusual cases account for the bulk.
Loneliness. Maybe most of these guys have no one else close in their life. In this case, they'd plausibly be better off if they took the effort they invested in their romantic life and redirected to friendships (probably mostly with other guys), but there's a lot of activation energy blocking that change.
Their romantic partner offering lots of value in other ways. I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it's hard for that to add up enough to outweigh the usual costs.
Wanting a dependent. Lots of men are pretty insecure, and having a dependent to provide for makes them feel better about themselves. This also flips the previous objection: high maintenance can be a plus if it makes a guy feel wanted/useful/valuable.
Social pressure/commitment/etc making the man stick around even though the relationship is not net positive for him.
The couple are de-facto close mostly-platonic friends, and the man wants to keep that friendship.

I'm interested in both actual data and anecdata. What am I missing here? What available evidence points strongly to some of these over others?

Edit-to-add: apparently lots of people are disagreeing with this, but I don't know what specifically you all are disagreeing with, it would be much more helpful to at least highlight some specific sentence or leave a comment or something.

Replies from: william-brewer, AllAmericanBreakfast, abramdemski, pktechgirl, D0TheMath, pktechgirl, Thane Ruthenis, alexander-gietelink-oldenziel, Lblack, D0TheMath, Viliam, johannes-c-mayer, Jonas Hallgren, D0TheMath, johnswentworth, LVSN

↑ comment by yams (william-brewer) · 2025-03-21T17:21:05.593Z · LW(p) · GW(p)

Ah, I think this just reads like you don't think of romantic relationships as having any value proposition beyond the sexual, other than those you listed (which are Things but not The Thing, where The Thing is some weird discursive milieu). Also the tone you used for describing the other Things is as though they are traps that convince one, incorrectly, to 'settle', rather than things that could actually plausibly outweigh sexual satisfaction.

Different people place different weight on sexual satisfaction (for a lot of different reasons, including age).

I'm mostly just trying to explain all the disagree votes. I think you'll get the most satisfying answer to your actual question by having a long chat with one of your asexual friends (as something like a control group, since the value of sex to them is always 0 anyway, so whatever their cause is for having romantic relationships is probably the kind of thing that you're looking for here).

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-21T18:16:37.580Z · LW(p) · GW(p)

I think you'll get the most satisfying answer to your actual question by having a long chat with one of your asexual friends (as something like a control group, since the value of sex to them is always 0 anyway, so whatever their cause is for having romantic relationships is probably the kind of thing that you're looking for here).

That's an excellent suggestion, thanks.

↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2025-03-21T06:01:56.769Z · LW(p) · GW(p)

“I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor.”

Some people enjoy attending to their partner and find meaning in emotional labor. Housing’s a lot more expensive than gifts and dates. My partner and I go 50/50 on expenses and chores. Some people like having long-term relationships with emotional depth. You might want to try exploring out of your bubble, especially if you life in SF, and see what some normal people (ie non-rationalists) in long term relationships have to say about it.

↑ comment by abramdemski · 2025-03-25T20:42:52.582Z · LW(p) · GW(p)

There are a lot of replies here, so I'm not sure whether someone already mentioned this, but: I have heard anecdotally that homosexual men often have relationships which maintain the level of sex over the long term, while homosexual women often have long-term relationships which very gradually decline in frequency of sex, with barely any sex after many decades have passed (but still happily in a relationship).

This mainly argues against your model here:

This also fits with my general models of mating markets: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don't find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.

It suggests instead that female sex drive naturally falls off in long-term relationships in a way that male sex drive doesn't, with sexual attraction to a partner being a smaller factor.

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2025-03-26T02:21:37.434Z · LW(p) · GW(p)

Note: You can verify this is the case by filtering for male respondents with male partners and female respondents with female partners in the survey data

↑ comment by Elizabeth (pktechgirl) · 2025-03-23T16:58:25.012Z · LW(p) · GW(p)

female partners are typically notoriously high maintenance in money, attention, and emotional labor.

That's the stereotype, but men are the ones who die sooner if divorced, which suggests they're getting a lot out of marriage.

ETA: looked it up, divorced women die sooner as well, but the effect is smaller despite divorce having a bigger financial impact on women.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-24T00:18:00.182Z · LW(p) · GW(p)

men are the ones who die sooner if divorced, which suggests

Causality dubious, seems much more likely on priors that men who divorced are disproportionately those with Shit Going On in their lives. That said, it is pretty plausible on priors that they're getting a lot out of marriage.

↑ comment by Garrett Baker (D0TheMath) · 2025-03-21T18:41:02.041Z · LW(p) · GW(p)

I will also note that Aella's relationships data is public, and has the following questions:

1. Your age? (rkkox57)
2. Which category fits you best? (4790ydl)
3. In a world where your partner was fully aware and deeply okay with it, how much would you be interested in having sexual/romantic experiences with people besides your partner? (ao3mcdk)
4. In a world where you were fully aware and deeply okay with it, how much would *your partner* be interested in having sexual/romantic experiences with people besides you? (wcq3vrx)
5. To get a little more specific, how long have you been in a relationship with this person? (wqx272y)
6. Which category fits your partner best? (u9jccbo)
7. Are you married to your partner? (pfqs9ad)
8. Do you have children with your partner? (qgjf1nu)
9. Have you or your partner ever cheated on each other? (hhf9b8h)
10. On average, over the last six months, about how often do you watch porn or consume erotic content for the purposes of arousal? (vnw3xxz)
11. How often do you and your partner have a fight? (x6jw4sp)
12. "It’s hard to imagine being happy without this relationship." (6u0bje)
13. "I have no secrets from my partner" (bgassjt)
14. "If my partner and I ever split up, it would be a logistical nightmare (e.g., separating house, friends) (e1claef)
15. "If my relationship ended I would be absolutely devastated" (2ytl03s)
16. "I don't really worry about other attractive people gaining too much of my partner's affection" (61m55wv)
17. "I sometimes worry that my partner will leave me for someone better" (xkjzgym)
18. "My relationship is playful" (w2uykq1)
19. "My partner an I are politically aligned" (12ycrs5)
20. "We have compatible humor" (o9empfe)
21. "The long-term routines and structure of my life are intertwined with my partner's" (li0toxk)
22. "The passion in this relationship is deeply intense" (gwzrhth)
23. "I share the same hobbies with my partner" (89hl8ys)
24. "My relationship causes me grief or sorrow" (rm0dtr6)
25. "If we broke up, I think I could date a higher quality person than they could" (vh27ywp)
26. "In hindsight, getting into this relationship was a bad idea" (1y6wfih)
27. "I feel like I would still be a desirable mate even if my partner left me" (qboob7y)
28. "My partner and I are sexually compatible" (9nxbebp)
29. "I often feel jealousy in my relationship" (kfcicm9)
30. "I think this relationship will last for a very long time" (ob8595u)
31. "My partner enables me to learn and grow" (e2oy448)
32. "My partner doesn't excite me" (6fcm06c)
33. "My partner doesn't sexually fulfill me" (xxf5wfc)
34. "I rely on my partner for a sense of self worth" (j0nv7n9)
35. "My partner and I handle fights well" (brtsa94)
36. "I feel confident in my relationship's ability to withstand everything life has to throw at us" (p81ekto)
37. "I sometimes fear my partner" (a21v31h)
38. "I try to stay aware of my partner's potential infidelity" (5qbgizc)
39. "I share my thoughts and opinions with my partner" (6lwugp9)
40. "This relationship is good for me" (wko8n8m)
41. "My partner takes priority over everything else in my life" (2sslsr1)
42. "We respect each other" (c39vvrk)
43. "My partner is more concerned with being right than with getting along" (rlkw670)
44. "I am more needy than my partner" (f3or362)
45. "I feel emotionally safe with my partner" (or9gg0a)
46. "I'm satisfied with our sex life" (6g14ks)
47. "My partner physically desires me" (kh7ppyp)
48. "My partner and I feel comfortable explicitly discussing our relationship on a meta level" (jrzzb06)
49. "My partner knows all my sexual fantasies" (s3cgjd2)
50. "My partner and I are intellectually matched" (ku1vm67)
51. "I am careful to maintain a personal identity separate from my partner" (u5esujt)
52. "I'm worried I'm not good enough for my partner" (45rohqq)
53. "My partner judges me" (fr4mr4a)
54. Did you answer this survey honestly/for a real partner? (7bfie2v)
55. On average, over the last six months, about how often do you and your partner have sex? (n1iblql)
56. Is the partner you just answered for, your longest romantic relationship? (zjfk3cu)

which should allow you to test a lot of your candidate answers, for example your first 3 hypotheses could be answered by looking at these:

1. Do you have children with your partner? (qgjf1nu)
1. "If my partner and I ever split up, it would be a logistical nightmare (e.g., separating house, friends) (e1claef) or 21. "The long-term routines and structure of my life are intertwined with my partner's" (li0toxk)
1. "I feel like I would still be a desirable mate even if my partner left me" (qboob7y)

↑ comment by Elizabeth (pktechgirl) · 2025-03-23T19:20:14.437Z · LW(p) · GW(p)

Their romantic partner offering lots of value in other ways. I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it's hard for that to add up enough to outweigh the usual costs.

Assuming arguendo this is true: if you care primarily about sex, hiring sex workers is orders of magnitude more efficient than marriage. Therefor the existence of a given marriage is evidence both sides get something out of it besides sex.

↑ comment by Thane Ruthenis · 2025-03-21T09:35:14.069Z · LW(p) · GW(p)

I see two explanations: the boring wholesome one and the interesting cynical one.

The wholesome one is: You're underestimating how much other value the partner offers and how much the men care about the mostly-platonic friendship. I think that's definitely a factor that explains some of the effect, though I don't know how much.

The cynical one is: It's part of the template [LW · GW]. Men feel that are "supposed to" have wives past a certain point in their lives; that it's their role to act. Perhaps they even feel that they are "supposed to" have wives they hate, see the cliché boomer jokes.

They don't deviate from this template, because:

It's just something that is largely Not Done. Plans such as "I shouldn't get married" or "I should get a divorce" aren't part of the hypothesis space they seriously consider.
- In the Fristonian humans-are-prediction-error-minimizers frame: being married is what the person expects, so their cognition ends up pointed towards completing the pattern, one way or another. As a (controversial) comparison, we can consider serial abuse victims, which seem to somehow self-select for abusive partners despite doing everything in their conscious power to avoid them.
- In your parlance: The "get married" life plan becomes the optimization target [LW · GW], rather than a prediction regarding how a satisfying life will look like.
- More generally: Most humans most of the time are not goal-optimizers, but adaptation-executors (or perhaps homeostatic agents [LW(p) · GW(p)]). So "but X isn't conductive to making this human happier" isn't necessarily a strong reason to expect the human not to do X.
Deviation has social costs/punishments. Being viewed as a loser, not being viewed as a reliable "family man", etc. More subtly: this would lead to social alienation, inability to relate. Consider the cliché "I hate my wife" boomer jokes again. If everyone in your friend group is married and makes these jokes all the time, and you aren't, that would be pretty ostracizing.
Deviation has psychological costs. Human identities (in the sense of "characters you play [LW · GW]") are often contextually defined. If someone spent ten years defining themselves in relation to their partner, and viewing their place in the world as part of a family unit, exiting the family unit would be fairly close to an identity death/life losing meaning. At the very least, they'd spend a fair bit of time adrift and unsure who they are/how to relate to the world anew – which means there are friction costs/usual problems with escaping a local optimum.
Not-deviation has psychological benefits. The feeling of "correctness", coming to enjoy the emotional labor, enjoying having a dependent, etc.

I don't know which of the two explains more of the effect. I'm somewhat suspicious of the interesting satisfyingly cynical one, simply because it's satisfyingly cynical and this is a subject for which people often invent various satisfyingly cynical ideas. It checks out to me at the object level, but it doesn't have to be the "real" explanation. (E. g., the "wholesome" reasons may be significant enough that most of the men wouldn't divorce even if the template dynamics were magically removed.)

↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-03-21T07:28:12.512Z · LW(p) · GW(p)

it's the mystery of love, John

↑ comment by Lucius Bushnaq (Lblack) · 2025-03-21T09:32:35.152Z · LW(p) · GW(p)

This data seems to be for sexual satisfaction rather than romantic satisfaction or general relationship satisfaction.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-21T16:02:27.626Z · LW(p) · GW(p)

Yes, the question is what value-proposition accounts for the romantic or general relationship satisfaction.

Replies from: Lblack

↑ comment by Lucius Bushnaq (Lblack) · 2025-03-21T17:34:31.897Z · LW(p) · GW(p)

Relationship ... stuff?

I guess I feel kind of confused by the framing of the question. I don't have a model under which the sexual aspect of a long-term relationship typically makes up the bulk of its value to the participants. So, if a long-term relationship isn't doing well on that front, and yet both participants keep pursuing the relationship, my first guess would be that it's due to the value of everything that is not that. I wouldn't particularly expect any one thing to stick out here. Maybe they have a thing where they cuddle and watch the sunrise together while they talk about their problems. Maybe they have a shared passion for arthouse films. Maybe they have so much history and such a mutually integrated life with partitioned responsibilities that learning to live alone again would be a massive labour investment, practically and emotionally. Maybe they admire each other. Probably there's a mixture of many things like that going on. Love can be fed by many little sources.

So, this I suppose:

Their romantic partner offering lots of value in other ways. I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it's hard for that to add up enough to outweigh the usual costs.

I don't find it hard at all to see how that'd add up to something that vastly outweighs the costs, and this would be my starting guess for what's mainly going on in most long-term relationships of this type.

↑ comment by Garrett Baker (D0TheMath) · 2025-03-24T16:49:44.428Z · LW(p) · GW(p)

An effect I noticed: Going through Aella's correlation matrix (with poorly labeled columns sadly), a feature which strongly correlates with the length of a relationship is codependency. Plotting question 20. "The long-term routines and structure of my life are intertwined with my partner's" (li0toxk) assuming that's what "codependency" refers to

The shaded region is a 95% posterior estimate for the mean of the distribution conditioned on the time-range (every 2 years) and cis-male respondents, with prior .

Note also that codependency and sex satisfaction are basically uncorrelated

This shouldn't be that surprising. Of course the longer two people are together the more their long term routines will be caught up with each other. But also this seems like a very reasonable candidate for why people will stick together even without a good sex life.

↑ comment by Viliam · 2025-03-21T20:16:52.511Z · LW(p) · GW(p)

a majority of long-term monogamous, hetero relationships are sexually unsatisfying for the man after a decade or so.

This seems supported by the popular wisdom. Question is, how much this is about relationships and sex specifically, and how much it is just another instance of a more general "life is full of various frustrations" or "when people reach their goals, after some time they became unsatisfied again" i.e. hedonistic treadmill.

sexual satisfaction is basically binary

Is it?

most women eventually settle on a guy they don't find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.

So, basically those women pretend to be more attracted than they are (to their partner, and probably also to themselves) in order to get married. Then they gradually stop pretending.

But why is it so important to get married (or whatever was the goal of the original pretending), but then it is no longer important to keep the marriage happy? Is that because women get whatever they want even from an unhappy marriage, and divorces are unlikely? That doesn't feel like a sufficient explanation to me: divorces are quite frequent, and often initiated by women.

I guess I am not sure what exactly is the women's utility function that this model assumes.

Why don't the men in question just leave?

Kids, not wanting to lose money in divorce, other value the partner provides, general lack of agency, hoping that the situation will magically improve... probably all of that together.

Also, it seems to me that often both partners lose value on the dating market when they start taking their relationship for granted, stop trying hard, gain weight, stop doing interesting things, and generally get older. Even if the guy is frustrated, that doesn't automatically mean that entering the dating market again would make him happy. I imagine that many divorced men find out that an alternative to "sex once a month" could also be "sex never" (or "sex once a month, but it also takes a lot of time and effort and money").

Replies from: VivaLaPanda

↑ comment by VivaLaPanda · 2025-03-23T01:17:49.293Z · LW(p) · GW(p)

Worth noting that this pattern occurs among gay couples as well! (i.e. sexless long-term-relationship, where one party is unhappy about this).

I think that conflict in desires/values is inherent in all relationship, and long-term-relationships have more room for conflict because they involve a closer/longer relationship. Sex drive is a major area where partners tend to diverge especially frequently (probably just for biological reasons in het couples).

It's not obvious to me that sex in marriages needs much special explanation beyond the above. Unless of course the confusion is just "why don't people immediately end all relationships whenever their desires conflict with those of their counterparty".

Replies from: Viliam

↑ comment by Viliam · 2025-03-23T12:00:02.146Z · LW(p) · GW(p)

A general source of problems is that when people try to get a new partner, they try to be... more appealing than usual, in various ways. Which means that after the partner is secured, the behavior reverts to the norm, which is often a disappointment.

One way how people try to impress their partners is that the one with lower sexual drive pretends to be more enthusiastic about sex than they actually are in long term. So the moment one partner goes "amazing, now I finally have someone who is happy to do X every day or week", the other partner goes "okay, now that the courtship phase is over, I guess I no longer have to do X every day or week".

There are also specific excuses in heterosexual couples, like the girl pretending that she is actually super into doing sex whenever possible, it's just that she is too worried about accidental pregnancy or her reputation... and when these things finally get out of the way, it turns out that it was just an excuse.

Perhaps the polyamorous people keep themselves in better shape, but I suspect that they have similar problems, only instead of "my partner no longer wants to do X" it is "my partner no longer wants to do X with me".

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2025-03-31T11:17:11.382Z · LW(p) · GW(p)

Their romantic partner offering lots of value in other ways. I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it's hard for that to add up enough to outweigh the usual costs.

Imagine a woman is a romantic relationship with somebody else. Are they still so great a person that you would still enjoy hanging out with them as a friend? If not that woman should not be your girlfriend. Friendship first. At least in my model romantic stuff should be stacked ontop of platonic love.

↑ comment by Jonas Hallgren · 2025-03-24T08:52:11.676Z · LW(p) · GW(p)

I thought I would give you another causal model based on neuroscience which might help.

I think your models are missing a core biological mechanism: nervous system co-regulation.

Most analyses of relationship value focus on measurable exchanges (sex, childcare, financial support), but overlook how humans are fundamentally regulatory beings. Our nervous systems evolved to stabilize through connection with others.

When you share your life with someone, your biological systems become coupled. This creates several important values:

Your stress response systems synchronize and buffer each other. A partner's presence literally changes how your body processes stress hormones - creating measurable physiological benefits that affect everything from immune function to sleep quality.
Your capacity to process difficult emotions expands dramatically with someone who consistently shows up for you, even without words.
Your nervous system craves predictability. A long-term partner represents a known regulatory pattern that helps maintain baseline homeostasis - creating a biological "home base" that's deeply stabilizing.

For many men, especially those with limited other sources of deep co-regulation, these benefits may outweigh sexual dissatisfaction. Consider how many men report feeling "at peace" at home despite minimal sexual connection - their nervous systems are receiving significant regulatory benefits.

This also explains why leaving feels so threatening beyond just practical considerations. Disconnecting an integrated regulatory system that has developed over years registers in our survival-oriented brains as a fundamental threat.

This isn't to suggest people should stay in unfulfilling relationships - rather, it helps explain why many do, and points to the importance of developing broader regulatory networks before making relationship transitions.

↑ comment by Garrett Baker (D0TheMath) · 2025-03-21T04:58:08.747Z · LW(p) · GW(p)

An obvious answer you missed: Lacking a prenup, courts often rule in favor of the woman over the man in the case of a contested divorce.

↑ comment by johnswentworth · 2025-03-24T01:14:29.635Z · LW(p) · GW(p)

Update 3 days later: apparently most people disagree strongly with

Their romantic partner offering lots of value in other ways. I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it's hard for that to add up enough to outweigh the usual costs.

Most people in the comments so far emphasize some kind of mysterious "relationship stuff" as upside, but my actual main update here is that most commenters probably think the typical costs are far far lower than I imagined? Unsure, maybe the "relationship stuff" is really ridiculously high value.

So I guess it's time to get more concrete about the costs I had in mind:

A quick google search says the male is primary or exclusive breadwinner in a majority of married couples. Ass-pull number: the monetary costs alone are probably ~50% higher living costs. (Not a factor of two higher, because the living costs of two people living together are much less than double the living costs of one person. Also I'm generally considering the no-kids case here; I don't feel as confused about couples with kids.)
I was picturing an anxious attachment style as the typical female case (without kids). That's unpleasant on a day-to-day basis to begin with, and I expect a lack of sex tends to make it a lot worse.
Eyeballing Aella's relationship survey data, a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more. That was somewhat-but-not-dramatically less than I previously pictured. Frequent fighting is very prototypically the sort of thing I would expect to wipe out more-than-all of the value of a relationship, and I expect it to be disproportionately bad in relationships with little sex.
Less legibly... conventional wisdom sure sounds like most married men find their wife net-stressful and unpleasant to be around a substantial portion of the time, especially in the unpleasant part of the hormonal cycle, and especially especially if they're not having much sex. For instance, there's a classic joke about a store salesman upselling a guy a truck, after upselling him a boat, after upselling him a tackle box, after [...] and the punchline is "No, he wasn't looking for a fishing rod. He came in looking for tampons, and I told him 'dude, your weekend is shot, you should go fishing!'".

(One thing to emphasize in these: sex isn't just a major value prop in its own right, I also expect that lots of the main costs of a relationship from the man's perspective are mitigated a lot by sex. Like, the sex makes the female partner behave less unpleasantly for a while.)

So, next question for people who had useful responses (especially @Lucius Bushnaq [LW · GW] and @yams [LW · GW]): do you think the mysterious relationship stuff outweighs those kinds of costs easily in the typical case, or do you imagine the costs in the typical case are not all that high?

Replies from: Lblack, Zack_M_Davis

↑ comment by Lucius Bushnaq (Lblack) · 2025-04-01T08:58:31.203Z · LW(p) · GW(p)

A quick google search says the male is primary or exclusive breadwinner in a majority of married couples. Ass-pull number: the monetary costs alone are probably ~50% higher living costs. (Not a factor of two higher, because the living costs of two people living together are much less than double the living costs of one person. Also I'm generally considering the no-kids case here; I don't feel as confused about couples with kids.

But remember that you already conditioned on 'married couples without kids'. My guess would be that in the subset of man-woman married couples without kids, the man being the exclusive breadwinner is a lot less common than in the set of all man-woman married couples. These properties seem like they'd be heavily anti-correlated.

In the subset of man-woman married couples without kids that get along, I wouldn't be surprised if having a partner effectively works out to more money for both participants, because you've got two incomes, but less than 2x living expenses.

I was picturing an anxious attachment style as the typical female case (without kids). That's unpleasant on a day-to-day basis to begin with, and I expect a lack of sex tends to make it a lot worse.

I am ... not ... picturing that as the typical case? Uh, I don't know what to say here really. That's just not an image that comes to mind for me when I picture 'older hetero married couple'. Plausibly I don't know enough normal people to have a good sense of what normal marriages are like.

Eyeballing Aella's relationship survey data, a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more. That was somewhat-but-not-dramatically less than I previously pictured. Frequent fighting is very prototypically the sort of thing I would expect to wipe out more-than-all of the value of a relationship, and I expect it to be disproportionately bad in relationships with little sex.

I think for many of those couples that fight multiple times a month, the alternative isn't separating and finding other, happier relationships where there are never any fights. The typical case I picture there is that the relationship has some fights because both participants aren't that great at communicating or understanding emotions, their own or other people's. If they separated and found new relationships, they'd get into fights in those relationships as well.

It seems to me that lots of humans are just very prone to getting into fights. With their partners, their families, their roommates etc., to the point that they have accepted having lots of fights as a basic fact of life. I don't think the correct takeaway from that is 'Most humans would be happier if they avoided having close relationships with other humans.'

Less legibly... conventional wisdom sure sounds like most married men find their wife net-stressful and unpleasant to be around a substantial portion of the time, especially in the unpleasant part of the hormonal cycle, and especially especially if they're not having much sex. For instance, there's a classic joke about a store salesman upselling a guy a truck, after upselling him a boat, after upselling him a tackle box, after [...] and the punchline is "No, he wasn't looking for a fishing rod. He came in looking for tampons, and I told him 'dude, your weekend is shot, you should go fishing!'".

Conventional wisdom also has it that married people often love each other so much they would literally die for their partner. I think 'conventional wisdom' is just a very big tent that has room for everything under the sun. If even 5-10% of married couples have bad relationships where the partners actively dislike each other, that'd be many millions of people in the English speaking population alone. To me, that seems like more than enough people to generate a subset of well-known conventional wisdoms talking about how awful long-term relationships are.

Case in point, I feel like I hear those particular conventional wisdoms less commonly these days in the Western world. My guess is this is because long-term heterosexual marriage is no longer culturally mandatory, so there's less unhappy couples around generating conventional wisdoms about their plight.

So, next question for people who had useful responses (especially @Lucius Bushnaq [LW · GW] and @yams [LW · GW]): do you think the mysterious relationship stuff outweighs those kinds of costs easily in the typical case, or do you imagine the costs in the typical case are not all that high?

So, in summary, both I think? I feel like the 'typical' picture of a hetero marriage you sketch is more like my picture of an 'unusually terrible' marriage. You condition on a bad sexual relationship and no children and the woman doesn't earn money and the man doesn't even like her, romantically or platonically. That subset of marriages sure sounds like it'd have a high chance of the man just walking away, barring countervailing cultural pressures. But I don't think most marriages where the sex isn't great are like that.

Replies from: johnswentworth, johnswentworth

↑ comment by johnswentworth · 2025-04-01T16:39:41.585Z · LW(p) · GW(p)

This comment gave me the information I'm looking for, so I don't want to keep dragging people through it. Please don't feel obligated to reply further!

That said, I did quickly look up some data on this bit:

But remember that you already conditioned on 'married couples without kids'. My guess would be that in the subset of man-woman married couples without kids, the man being the exclusive breadwinner is a lot less common than in the set of all man-woman married couples.

... so I figured I'd drop it in the thread.

A bar chart showing that Black wives and college graduates are more likely than other wives to be in egalitarian and breadwinner wife marriages

When interpreting these numbers, bear in mind that many couples with no kids probably intend to have kids in the not-too-distant future, so the discrepancy shown between "no children" and 1+ children is probably somewhat smaller than the underlying discrepancy of interest (which pushes marginally more in favor of Lucius' guess).

↑ comment by johnswentworth · 2025-04-01T15:57:23.266Z · LW(p) · GW(p)

Big thank you for responding, this was very helpful.

↑ comment by Zack_M_Davis · 2025-03-24T07:13:57.056Z · LW(p) · GW(p)

Not sure how much this generalizes to everyone, but part of the story (for either the behavior or the pattern of responses to the question) might that some people are ideologically attached to believing in love: that women and men need each other as a terminal value, rather than just instrumentally using each other for resources or sex. For myself, without having any particular empirical evidence or logical counterargument to offer, the entire premise of the question just feels sad and gross. It's like you're telling me you don't understand why people try to make ghosts happy. But I want ghosts to be happy.

Replies from: johnswentworth

↑ comment by johnswentworth · 2025-03-24T15:56:37.624Z · LW(p) · GW(p)

That is useful, thanks.

Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much? In particular, if the answer is in fact "most men would be happier single but are ideologically attached to believing in love", then I want to be able to update accordingly. And if the answer is not that, then I want to update that most men would not be happier single. With the current discussion, most of what I've learned is that lots of people are triggered by the question, but that doesn't really tell me much about the underlying reality.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2025-03-25T12:06:04.484Z · LW(p) · GW(p)

Track record: My own cynical take [LW(p) · GW(p)] seems to be doing better with regards to not triggering people (though it's admittedly less visible).

Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much?

First off, I'm kind of confused about how you didn't see this coming. There seems to be a major "missing mood" going on in your posts on the topic – and I speak as someone who is sorta-aromantic, considers the upsides of any potential romantic relationship to have a fairly low upper bound for himself^[1], and is very much willing to entertain the idea that a typical romantic relationship is a net-negative dumpster fire.

So, obvious-to-me advice: Keep a mental model of what topics are likely very sensitive and liable to trigger people, and put in tons of caveats and "yes, I know, this is very cynical, but it's my current understanding" and "I could totally be fundamentally mistaken here".

In particular, a generalization of an advice from here has been living in my head rent-free for years (edited/adapted):

Tips For Talking About Your Beliefs On Sensitive Topics
You want to make it clear that they're just your current beliefs about the objective reality, and you don't necessarily like that reality so they're not statements about how the world ought to be, and also they're not necessarily objectively correct and certainly aren't all-encompassing so you're not condemning people who have different beliefs or experiences. If you just say, "I don't understand why people do X," everyone will hear you as saying that everyone who does X is an untermensch who should be gutted and speared because in high-simulacrum-level environments disagreeing with people is viewed as a hostile act attempting to lower competing coalitions' status, and failing to furiously oppose such acts will get you depowered and killed. So be sure to be extra careful by saying something like, "It is my current belief, and I mean with respect to my own beliefs about the objective reality, that a typical romantic relationship seems flawed in lots of ways, but I stress, and this is very important, that if you feel or believe differently, then that too is a valid and potentially more accurate set of beliefs, and we don't have to OH GOD NOT THE SPEARS ARRRGHHHH!"

More concretely, here's how I would have phrased your initial post:

Rewrite

Here's a place where my model of the typical traditional romantic relationships seems to be missing something. I'd be interested to hear people's takes on what it might be.

Disclaimer: I'm trying to understand the general/stereotypical case here, i. e., what often ends up happening in practice. I'm not claiming that this is how relationships ought to be like, nor that all existing relationships are like this. But on my model, most people are deeply flawed, they tend to form deeply flawed relationships, and I'd like to understand why these relationships still work out. Bottom line is, this is going to be a fairly cynical/pessimistic take (with the validity of its cynicism being something I'm willing to question).

Background claims:

My model of the stereotypical/traditional long-term monogamous hetero relationship has a lot of downsides for men. For example:
- Financial costs: Up to 50% higher living costs (since in the "traditional" template, men are the breadwinners.)
- Frequent, likely highly stressful, arguments. See Aella's relationship survey data: a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more.
- General need to manage/account for the partner's emotional issues. (My current model of the "traditional" relationship assumes the anxious attachment style for the woman, which would be unpleasant to manage.)
For hetero men, consistent sexual satisfaction is a major upside offered by a relationship, providing a large fraction of the relationship-value.
A majority of traditional relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella's data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from there). This also fits with my general models of dating: women usually find the large majority of men sexually unattractive, most women eventually settle on a guy they don't find all that sexually attractive, so it should not be surprising if that relationship ends up with very little sex after a few years.

Taking on purely utilitarian lens, for a relationship to persist, the benefits offered by it should outweigh its costs. However, on my current model, that shouldn't be the case for the average man. I expect the stated downsides to be quite costly, and if we remove consistent sex from the equation, the remaining value (again, for a stereotypical man) seems comparatively small.

So: Why do these relationships persist? Obviously the men might not have better relationship prospects, but they could just not have any relationship. The central question which my models don't have a compelling answer to is: what is making these relationships net positive value for the men, relative to not having a romantic relationship at all?

Some obvious candidate answers:

The cultural stereotypes diverge from reality in some key ways, so my model is fundamentally mistaken. E. g.:
- I'm overestimating the downsides: the arguments aren't that frequent/aren't very stressful, female partners aren't actually "high-maintanance", etc.
- I'm overestimating the value of sex for a typical man.
- I'm underestimating how much other value relationships offers men. If so: what is that "other value", concretely? (Note that it'd need to add up to quite a lot to outweigh the emotional and financial costs, under my current model.)
Kids. This one makes sense for those raising kids, but what about everyone else? Especially as fertility goes down.
The wide tail. There's plenty of cases which make sense which are individually unusual - e.g. my own parents are business partners. Maybe in aggregate all these unusual cases account for the bulk.
Loneliness. Maybe most of these guys have no one else close in their life. In this case, they'd plausibly be better off if they took the effort they invested in their romantic life and redirected to friendships (probably mostly with other guys), but there's a lot of activation energy blocking that change.
Wanting a dependent. Lots of men are pretty insecure, and having a dependent to provide for makes them feel better about themselves. This also flips the previous objection: high maintenance can be a plus if it makes a guy feel wanted/useful/valuable.
Social pressure/commitment/etc making the man stick around even though the relationship is not net positive for him.
The couple are de-facto close mostly-platonic friends, and the man wants to keep that friendship.

I'm interested in both actual data and anecdata. What am I missing here? What available evidence points strongly to some of these over others?

Obvious way to A/B test this would be to find some group of rationalist-y people who aren't reading LW/your shortform, post my version there, and see the reactions. Not sure what that place would be. (EA forum? r/rational's Friday Open Threads? r/slatestarcodex? Some Discord/Substack group?)

Adapting it for non-rationalist-y audiences (e. g., r/AskMen) would require more rewriting. Mainly, coating the utilitarian language in more, ahem, normie terms.

^{^}
Given the choice between the best possible romantic relationship and $1m, I'd pick $1m. ~~Absent munchkinry like "my ideal girlfriend is a genius alignment researcher on the level of von Neumann and Einstein".~~

↑ comment by LVSN · 2025-03-21T05:51:08.070Z · LW(p) · GW(p)

girl prety
personal desire to be worthy of being an example vindicating the hope that good guys can 'get the girl'; giving up on one means nothing will ever stay and doom is eternal

comment by johnswentworth · 2023-06-16T04:31:14.360Z · LW(p) · GW(p)

Consider two claims:

Any system can be modeled as maximizing some utility function, therefore utility maximization is not a very useful model
Corrigibility is possible, but utility maximization is incompatible with corrigibility, therefore we need some non-utility-maximizer kind of agent to achieve corrigibility

These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.

I expect that many peoples' intuitive mental models around utility maximization boil down to "boo utility maximizer models", and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.

Replies from: steve2152, johannes-c-mayer, Viliam, Vladimir_Nesov, jesper-norregaard-sorensen

↑ comment by Steven Byrnes (steve2152) · 2023-06-16T12:33:34.627Z · LW(p) · GW(p)

FWIW I endorse the second claim when the utility function depends exclusively on the state of the world in the distant future, whereas I endorse the first claim when the utility function can depend on anything whatsoever (e.g. what actions I’m taking right this second). (details [LW · GW])

I wish we had different terms for those two things. That might help with any alleged yay/boo reasoning.

(When Eliezer talks about utility functions, he seems to assume that it depends exclusively on the state of the world in the distant future.)

↑ comment by Johannes C. Mayer (johannes-c-mayer) · 2023-07-09T11:07:30.073Z · LW(p) · GW(p)

Expected Utility Maximization is Not Enough

Consider a homomorphically encrypted computation running somewhere in the cloud. The computations correspond to running an AGI. Now from the outside, you can still model the AGI based on how it behaves, as an expected utility maximizer, if you have a lot of observational data about the AGI (or at least let's take this as a reasonable assumption).

No matter how closely you look at the computations, you will not be able to figure out how to change these computations in order to make the AGI aligned if it was not aligned already (Also, let's assume that you are some sort of Cartesian agent, otherwise you would probably already be dead if you were running these kinds of computations).

So, my claim is not that modeling a system as an expected utility maximizer can't be useful. Instead, I claim that this model is incomplete. At least with regard to the task of computing an update to the system, such that when we apply this update to the system, it would become aligned.

Of course, you can model any system, as an expected utility maximizer. But just because I can use the "high level" conceptual model of expected utility maximization, to model the behavior of a system very well. But behavior is not the only thing that we care about, we actually care about being able to understand the internal workings of the system, such that it becomes much easier to think about how to align the system.

So the following seems to be beside the point unless I am <missing/misunderstanding> something:

These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function.

Maybe I have missed the fact that the claim you listed says that expected utility maximization is not very useful. And I'm saying it can be useful, it might just not be sufficient at all to actually align a particular AGI system. Even if you can do it arbitrarily well.

↑ comment by Viliam · 2023-06-17T21:49:56.904Z · LW(p) · GW(p)

I am not an expert, but as I remember it, it was a claim that "any system that follows certain axioms can be modeled as maximizing some utility function". The axioms assumed that there were no circular preferences -- if someone prefers A to B, B to C, and C to A, it is impossible to define a utility function such that u(A) > u(B) > u(C) > u(A) -- and that if the system says that A > B > C, it can decide between e.g. a 100% chance of B, and a 50% chance of A with a 50% chance of C, again in a way that is consistent.

I am not sure how this works when the system is allowed to take current time into account, for example when it is allowed to prefer A to B on Monday but prefer B to A on Tuesday. I suppose that in such situation any system can trivially be modeled by a utility function that at each moment assigns utility 1 to what the system actually did in that moment, and utility 0 to everything else.

Corrigibility is incompatible with assigning utility to everything in advance. A system that has preferences about future will also have a preference about not having its utility function changed. (For the same reason people have a preference not to be brainwashed, or not to take drugs, even if after brainwashing they are happy about having been brainwashed, and after getting addicted they do want more drugs.)

Corrigible system would be like: "I prefer A to B at this moment, but if humans decide to fix me and make me prefer B to A, then I prefer B to A". In other words, it doesn't have values for u(A) and u(B), or it doesn't always act according to those values. A consistent system that currently prefers A to B would prefer not to be fixed.

Replies from: steve2152

↑ comment by Steven Byrnes (steve2152) · 2023-06-17T22:58:51.954Z · LW(p) · GW(p)

I think John's 1st bullet point was referring to an argument you can find in https://www.lesswrong.com/posts/NxF5G6CJiof6cemTw/coherence-arguments-do-not-entail-goal-directed-behavior [LW · GW] and related.

↑ comment by Vladimir_Nesov · 2023-06-16T09:16:43.802Z · LW(p) · GW(p)

A utility function represents preference elicited in a large collection of situations, each a separate choice between events that happens with incomplete information, as an event is not a particular point. This preference needs to be consistent across different situations to be representable by expected utility of a single utility function.

Once formulated, a utility function can be applied to a single choice/situation, such as a choice of a policy. But a system that only ever makes a single choice is not a natural fit for expected utility frame, and that's the kind of system that usually appears in "any system can be modeled as maximizing some utility function". So it's not enough to maximize something once, or in a narrow collection of situations, the situations the system is hypothetically exposed to need to be about as diverse as choices between any pair of events, with some of the events very large, corresponding to unreasonably incomplete information, all drawn across the same probability space.

One place this mismatch of frames happens is with updateless decision theory. An updateless decision is a choice of a single policy, once and for all, so there is no reason for it to be guided by expected utility [LW(p) · GW(p)], even though it could be. The utility function for the updateless choice of policy would then need to be obtained elsewhere, in a setting that has all these situations with separate (rather than all enacting a single policy) and mutually coherent choices under uncertainty. But once an updateless policy is settled (by a policy-level decision), actions implied by it (rather than action-level decisions in expected utility frame) no longer need to be coherent. Not being coherent, they are not representable by an action-level utility function.

So by embracing updatelessness, we lose the setting that would elicit utility if the actions were instead individual mutually coherent decisions. And conversely, by embracing coherence of action-level decisions, we get an implied policy that's not updatelessly optimal with respect to the very precise outcomes determined by any given whole policy. So an updateless agent founded on expected utility maximization implicitly references a different non-updateless agent whose preference is elicited by making separate action-level decisions under a much greater uncertainty than the policy-level alternatives the updateless agent considers.

↑ comment by JNS (jesper-norregaard-sorensen) · 2023-06-16T06:40:35.756Z · LW(p) · GW(p)

Completely off the cuff take:

I don't think claim 1 is wrong, but it does clash with claim 2.

That means any system that has to be corrigible cannot be a system that maximizes a simple utility function (1 dimension), or put another way "whatever utility function is maximizes must be along multiple dimensions".

Which seems to be pretty much what humans do, we have really complex utility functions, and everything seems to be ever changing and we have some control over it ourselves (and sometimes that goes wrong and people end up maxing out a singular dimension at the cost of everything else).

Note to self: Think more about this and if possible write up something more coherent and explanatory.

comment by johnswentworth · 2021-02-16T18:27:16.515Z · LW(p) · GW(p)

One second-order effect of the pandemic which I've heard talked about less than I'd expect:

This is the best proxy I found on FRED for new businesses founded in the US, by week. There was a mild upward trend over the last few years, it's really taken off lately. Not sure how much of this is kids who would otherwise be in college, people starting side gigs while working from home, people quitting their jobs and starting their own businesses so they can look after the kids, extra slack from stimulus checks, people losing their old jobs en masse but still having enough savings to start a business, ...

For the stagnation-hypothesis folks who lament relatively low rates of entrepreneurship today, this should probably be a big deal.

Replies from: gwern, Gunnar_Zarncke

↑ comment by gwern · 2021-02-18T02:37:45.633Z · LW(p) · GW(p)

How sure are you that the composition is interesting? How many of these are just quick mask-makers or sanitizer-makers, or just replacing restaurants that have now gone out of business? (ie very low-value-added companies, of the 'making fast food in a stall in a Third World country' sort of 'startup', which make essentially no or negative long-term contributions).

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-02-18T04:09:19.668Z · LW(p) · GW(p)

Good question. I haven't seen particularly detailed data on these on FRED, but they do have separate series for "high propensity" business applications (businesses they think are likely to hire employees), business applications with planned wages, and business applications from corporations, as well as series for each state. The spike is smaller for planned wages, and nonexistent for corporations, so the new businesses are probably mostly single proprietors or partnerships. Other than that, I don't know what the breakdown looks like across industries.

Replies from: gwern

↑ comment by gwern · 2024-02-16T00:14:41.439Z · LW(p) · GW(p)

How do you feel about this claim now? I haven't noticed a whole lot of innovation coming from all these small businesses, and a lot of them seem like they were likely just vehicles for the extraordinary extent of fraud as the results from all the investigations & analyses come in.

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-02-16T00:26:06.777Z · LW(p) · GW(p)

Well, it wasn't just a temporary bump:

... so it's presumably also not just the result of pandemic giveaway fraud, unless that fraud is ongoing.

Presumably the thing to check here would be TFP, but Fred's US TFP series currently only goes to end of 2019, so apparently we're still waiting on that one? Either that or I'm looking at the wrong series.

↑ comment by Gunnar_Zarncke · 2021-02-16T21:46:19.510Z · LW(p) · GW(p)

Somebody should post this on Paul Graham's twitter. He would be very interested in it (I can't): https://mobile.twitter.com/paulg

comment by johnswentworth · 2020-10-14T17:27:36.473Z · LW(p) · GW(p)

Neat problem of the week: researchers just announced roughly-room-temperature superconductivity at pressures around 270 GPa. That's stupidly high pressure - a friend tells me "they're probably breaking a diamond each time they do a measurement". That said, pressures in single-digit GPa do show up in structural problems occasionally, so achieving hundreds of GPa scalably/cheaply isn't that many orders of magnitude away from reasonable, it's just not something that there's historically been much demand for. This problem plays with one idea for generating such pressures in a mass-produceable way.

Suppose we have three materials in a coaxial wire:

innermost material has a low thermal expansion coefficient and high Young's modulus (i.e. it's stiff)
middle material is a thin cylinder of our high-temp superconducting concoction
outermost material has a high thermal expansion coefficient and high Young's modulus.

We construct the wire at high temperature, then cool it. As the temperature drops, the innermost material stays roughly the same size (since it has low thermal expansion coefficient), while the outermost material shrinks, so the superconducting concoction is squeezed between them.

Exercises:

Find an expression for the resulting pressure in the superconducting concoction in terms of the Young's moduli, expansion coefficients, temperature change, and dimensions of the inner and outer materials. (Assume the width of the superconducting layer is negligible, and the outer layer doesn't break.)
Look up parameters for some common materials (e.g. steel, tungsten, copper, porcelain, aluminum, silicon carbide, etc), and compute the pressures they could produce with reasonable dimensions (assuming that their material properties don't change too dramatically with such high pressures).
Find an expression for the internal tension as a function of radial distance in the outermost layer.
Pick one material, look up its tensile strength, and compute how thick it would have to be to serve as the outermost layer without breaking, assuming the superconducting layer is at 270 GPa.

comment by johnswentworth · 2021-03-12T16:53:19.267Z · LW(p) · GW(p)

Brief update on how it's going with RadVac.

I've been running ELISA tests all week. In the first test, I did not detect stronger binding to any of the peptides than to the control in any of several samples from myself or my girlfriend. But the control itself was looking awfully suspicious, so I ran another couple tests. Sure enough, something in my samples is binding quite strongly to the control itself (i.e. the blocking agent), which is exactly what the control is supposed to not do. So I'm going to try out some other blocking agents, and hopefully get an actually-valid control group.

(More specifics on the test: I ran a control with blocking agent + sample, and another with blocking agent + blank sample, and the blocking agent + sample gave a strong positive signal while the blank sample gave nothing. That implies something in the sample was definitely binding to both the blocking agent and the secondary antibodies used in later steps, and that binding was much stronger than the secondary antibodies themselves binding to anything in the blocking agent + blank sample.)

In other news, the RadVac team released the next version of their recipe + whitepaper. Particularly notable:

... many people who have taken the nasal vaccine are testing negative for serum antibodies with commercial and lab ELISA tests, while many who inject the vaccine (subcutaneous or intramuscular) are testing positive (saliva testing appears to be providing evidence of mucosal response among a subset of researchers who have administered the vaccine intranasally).

Note that they're talking specifically about serum (i.e. blood) antibodies here. So apparently injecting it does induce blood antibodies of the sort detectable by commercial tests (at least some of the time), but snorting it mostly just produces mucosal antibodies (also at least some of the time).

This is a significant update: most of my prior on the vaccine working was based on vague comments in the previous radvac spec about at least some people getting positive test results. But we didn't know what kind of test results those were, so there was a lot of uncertainty about exactly what "working" looked like. In particular, we didn't know whether antibodies were induced in blood or just mucus, and we didn't know if they were induced consistently or only in some people (the latter of which is the "more dakka probably helps" world). Now we know that it's mostly just mucus (at least for nasal administration). Still unsure about how consistently it works - the wording in the doc makes it sound like only some people saw a response, but I suspect the authors are just hedging because they know there's both selection effects and a lot of noise in the data which comes back to them.

The latest version of the vaccine has been updated to give it a bit more kick - slightly higher dose, and the chitosan nanoparticle formula has been changed in a way which should make the peptides more visible to the immune system. Also, the list of peptides has been trimmed down a bit, so the latest version should actually be cheaper, though the preparation is slightly more complex.

Replies from: ChristianKl

↑ comment by ChristianKl · 2021-03-13T22:23:28.640Z · LW(p) · GW(p)

but I suspect the authors are just hedging because they know there's both selection effects and a lot of noise in the data which comes back to them.

I would expect that hedging also happens because making definitive clinical claims has more danger from the FDA then making hedged statements.

comment by johnswentworth · 2023-06-01T18:15:15.604Z · LW(p) · GW(p)

So I saw the Taxonomy Of What Magic Is Doing In Fantasy Books and Eliezer’s commentary on ASC's latest linkpost, and I have cached thoughts on the matter.

My cached thoughts start with a somewhat different question - not "what role does magic play in fantasy fiction?" (e.g. what fantasies does it fulfill), but rather... insofar as magic is a natural category, what does it denote? So I'm less interested in the relatively-expansive notion of "magic" sometimes seen in fiction (which includes e.g. alternate physics), and more interested in the pattern called "magic" which recurs among tons of real-world ancient cultures.

Claim (weakly held): the main natural category here is symbols changing the territory. Normally symbols represent the world, and changing the symbols just makes them not match the world anymore - it doesn't make the world do something different. But if the symbols are "magic", then changing the symbols changes the things they represent in the world. Canonical examples:

Wizard/shaman/etc draws magic symbols, speaks magic words, performs magic ritual, or even thinks magic thoughts, thereby causing something to happen in the world.
Messing with a voodoo doll messes with the person it represents.
"Sympathetic" magic, which explicitly uses symbols of things to influence those things.
Magic which turns emotional states into reality.

I would guess that most historical "magic" was of this type.

comment by johnswentworth · 2021-11-24T00:34:50.628Z · LW(p) · GW(p)

Everybody's been talking about Paxlovid, and how ridiculous it is to both stop the trial since it's so effective but also not approve it immediately. I want to at least float an alternative hypothesis, which I don't think is very probable at this point, but does strike me as at least plausible (like, 20% probability would be my gut estimate) based on not-very-much investigation.

Early stopping is a pretty standard p-hacking technique. I start out planning to collect 100 data points, but if I manage to get a significant p-value with only 30 data points, then I just stop there. (Indeed, it looks like the Paxlovid study only had 30 actual data points, i.e. people hospitalized.) Rather than only getting "significance" if all 100 data points together are significant, I can declare "significance" if the p-value drops below the line at any time. That gives me a lot more choices in the garden of forking counterfactual paths.

Now, success rates on most clinical trials are not very high. (They vary a lot by area - most areas are about 15-25%. Cancer is far and away the worst, below 4%, and vaccines are the best, over 30%.) So I'd expect that p-hacking is a pretty large chunk of approved drugs, which means pharma companies are heavily selected for things like finding-excuses-to-halt-good-seeming-trials-early.

Replies from: gwern

↑ comment by gwern · 2021-11-24T01:28:04.752Z · LW(p) · GW(p)

Early stopping is a pretty standard p-hacking technique.

It was stopped after a pre-planned interim analysis; that means they're calculating the stopping criteria/p-values with multiple testing correction built in, using sequential analysis.

comment by johnswentworth · 2024-02-13T23:01:12.945Z · LW(p) · GW(p)

Here's an AI-driven external cognitive tool I'd like to see someone build, so I could use it.

This would be a software tool, and the user interface would have two columns. In one column, I write. Could be natural language (like google docs), or code (like a normal IDE), or latex (like overleaf), depending on what use-case the tool-designer wants to focus on. In the other column, a language and/or image model provides local annotations for each block of text. For instance, the LM's annotations might be:

(Natural language or math use-case:) Explanation or visualization of a mental picture generated by the main text at each paragraph
(Natural language use-case:) Emotional valence at each paragraph
(Natural language or math use-case:) Some potential objections tracked at each paragraph
(Code:) Fermi estimates of runtime and/or memory usage

This is the sort of stuff I need to track mentally in order to write high-quality posts/code/math, so it would potentially be very high value to externalize that cognition.

Also, the same product could potentially be made visible to readers (for the natural language/math use-cases) to make more visible the things the author intends to be mentally tracked. That, in turn, would potentially make it a lot easier for readers to follow e.g. complicated math.

Replies from: None

↑ comment by [deleted] · 2024-02-14T00:56:42.463Z · LW(p) · GW(p)

Can you share your prompts and if you consider the output satisfactory for some example test cases?

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-02-14T17:12:05.939Z · LW(p) · GW(p)

I haven't experimented very much, but here's one example prompt.

Please describe what you mentally picture when reading the following block of text:
"
A Shutdown Problem Proposal
First things first: this is not (yet) aimed at solving the whole corrigibility problem, or even the whole shutdown problem.
The main thing this proposal is intended to do is to get past the barriers MIRI found in their old work on the shutdown problem. In particular, in a toy problem basically-identical to the one MIRI used, we want an agent which:
Does not want to manipulate the shutdown button
Does respond to the shutdown button
Does want to make any child-agents it creates responsive-but-not-manipulative to the shutdown button, recursively (i.e. including children-of-children etc)
If I understand correctly, this is roughly the combination of features which MIRI had the most trouble achieving simultaneously.
"

This one produced basically-decent results from GPT-4.

Although I don't have the exact prompt on hand at the moment, I've also asked GPT-4 to annotate a piece of code line-by-line with a Fermi estimate of its runtime, which worked pretty well.

Replies from: None

↑ comment by [deleted] · 2024-02-14T17:48:17.632Z · LW(p) · GW(p)

Yeah i was thinking your specs were, well

Wrap gpt-4 and Gemini, columned output over a set of text, applying prompts to each section? Prototype in a weekend.
Make the AI able to meaningfully contribute non obvious comments to help someone who already is an expert?

https://xkcd.com/1425/

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-02-14T18:02:47.591Z · LW(p) · GW(p)

Don't really need comments which are non-obvious to an expert. Part of what makes LLMs well-suited to building external cognitive tools is that external cognitive tools can create value by just tracking "obvious" things, thereby freeing up the user's attention/working memory for other things.

Replies from: Viliam

↑ comment by Viliam · 2024-02-15T10:06:56.270Z · LW(p) · GW(p)

So kinda like spellcheckers (most typos you could figure out, but why spend time and attention on proofreading if the program can do that for you), but... thought-checkers.

Like, if a part of your article contradicts another part, it would be underlined.

Replies from: gwern

↑ comment by gwern · 2024-02-15T22:55:47.273Z · LW(p) · GW(p)

I've long wanted this, but it's not clear how to do it. Long-context LLMs are still expensive and for authors who need it most, context windows are still too small: me or Yudkowsky, for example, would still exceed the context window of almost all LLMs except possibly the newest Gemini. And then you have their weak reasoning. You could try to RAG it, but embeddings are not necessarily tuned to encode logically contradictory or inconsistent claims: probably if I wrote "the sky is blue" in one place and "the sky is red" in another, a retrieval would be able to retrieve both paragraphs and a LLM point out that they are contradictory, but such blatant contradictions are probably too rare to be useful to check for. You want something more subtle, like where you say "the sky is blue" and elsewhere "I looked up from the ground and saw the color of apples". You could try to brute force it and consider every pairwise comparison of 2 reasonable sized chunks of text and ask for contradictions, but this is quadratic and will get slow and expensive and probably turn up too many false positives. (And how do you screen off false positives and mark them 'valid'?)

My general thinking these days is that these truly useful 'tools for thought' LLMs are going to require either much better & cheaper LLMs, so smart that they can provide useful assistance despite being used in a grossly unnatural way input-wise or safety-tuned to hell, or biting the bullet of finetuning/dynamic-evaluation (see my Nenex proposal).

A LLM finetuned on my corpus can hope to quickly find, with good accuracy, contradictions because it was trained to know 'the sky was blue' when I wrote that at the beginning of the corpus, and it gets confused when it hits 'the color of ____' and it gets the prediction totally wrong. And RAG on an embedding tailored to the corpus can hope to surface the contradictions because it sees the two uses are the same in the essays' context, etc. (And if you run them locally, and they don't need a large context window because of the finetuning, they will be fast and cheap, so you can more meaningfully apply the brute force approach; or you could just run multiple epoches on your data, with an auxiliary prompt asking for a general critique, which would cover contradictions. 'You say here X, but don't I recall you saying ~X back at the beginning? What gives?')

Replies from: Viliam, None

↑ comment by Viliam · 2024-02-16T15:25:44.797Z · LW(p) · GW(p)

Perhaps you could do it in multiple steps.

Feed it a shorter text (that fits in the window) and ask it to provide a short summary focusing on factual statements. Then hopefully all short versions could fit in the window. Find the contradiction -- report the two contradicting factual statements and which section they appeared in. Locate the statement in the original text.

↑ comment by [deleted] · 2024-02-15T23:21:15.989Z · LW(p) · GW(p)

Did you write more than 7 million words yet @gwern? https://www.google.com/amp/s/blog.google/technology/ai/google-gemini-next-generation-model-february-2024/amp/

Basically it's the "lazy wait" calculation. Get something to work now or wait until the 700k or 7m word context window ships.

Replies from: gwern

↑ comment by gwern · 2024-02-16T00:35:38.557Z · LW(p) · GW(p)

I may have. Just gwern.net is, I think, somewhere around 2m, and it's not comprehensive. Also, for contradictions, I would want to detect contradictions against citations/references as well (detecting miscitations would be more important than self-consistency IMO), and as a rough ballpark, the current Gwern.net annotation* corpus is approaching 4.3m words, looks like, and is also not comprehensive. So, closer than one might think! (Anyway, doesn't deal with the cost or latency: as you can see in the demos, we are talking minutes, not seconds, for these million-token calls and the price is probably going to be in the dollar+ regime per call.)

* which are not fulltext. It would be nice to throw in all of the hosted paper & book & webpage fulltexts, but then that's probably more like 200m+ words.

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2024-02-16T01:20:14.137Z · LW(p) · GW(p)

minutes

There isn't any clear technical obstruction to getting this time down pretty small with more parallelism.

Replies from: gwern

↑ comment by gwern · 2024-02-16T03:31:46.262Z · LW(p) · GW(p)

There may not be any 'clear' technical obstruction, but it has failed badly in the past. 'Add more parallelism' (particularly hierarchically) is one of the most obvious ways to improve attention, and people have spent the past 5 years failing to come up with efficient attentions that do anything but move along a Pareto frontier from 'fast but doesn't work' to 'slow and works only as well as the original dense attention'. It's just inherently difficult to know what tokens you will need across millions of tokens without input from all the other tokens (unless you are psychic), implying extensive computation of some sort, which makes things inherently serial and costs you latency, even if you are rich enough to spend compute like water. You'll note that when Claude-2 was demoing the ultra-long attention windows, it too spent a minute or two churning. While the most effective improvements in long-range attention like Flash Attention or Ring Attention are just hyperoptimizing dense attention, which is inherently limited.

comment by johnswentworth · 2021-10-01T20:00:22.804Z · LW(p) · GW(p)

I've long been very suspicious of aggregate economic measures like GDP. But GDP is clearly measuring something, and whatever that something is it seems to increase remarkably smoothly despite huge technological revolutions. So I spent some time this morning reading up and playing with numbers and generally figuring out how to think about the smoothness of GDP increase.

Major takeaways:

When new tech makes something previously expensive very cheap, GDP mostly ignores it. (This happens in a subtle way related to how we actually compute it.)
- Historical GDP curves mainly measure things which are expensive ~now. Things which are cheap now are mostly ignored. In other words: GDP growth basically measures the goods whose production is revolutionized the least.
Re: AI takeoff, the right way to extrapolate today's GDP curve to post-AI is to think about things which will still be scarce post-AI, and then imagine the growth of production of those things.
- Even a very sharp, economically-revolutionary AI takeoff could look like slow smooth GDP growth, because GDP growth will basically only measure the things whose production is least revolutionized.

Why am I harping on about technicalities of GDP? Well, I hear about some AI forecasts which are heavily based on the outside view that economic progress (as measured by GDP) is smooth, and this is so robust historically that we should expect it to continue going forward. And I think this is basically right - GDP, as we actually compute it, is so remarkably smooth that we should expect that to continue. Alas, this doesn't tell us very much about how crazy or sharp AI takeoff will be, because GDP (as we actually compute it) systematically ignores anything that's revolutionized.

Replies from: johnswentworth, mark-xu

↑ comment by johnswentworth · 2021-10-01T20:01:00.784Z · LW(p) · GW(p)

If you want a full post on this, upvote this comment.

Replies from: adamzerner, Raemon

↑ comment by Adam Zerner (adamzerner) · 2021-10-01T20:59:32.308Z · LW(p) · GW(p)

In writing How much should we value life? [LW · GW], I spent some time digging into AI timeline stuff. It lead me to When Will AI Be Created?, written by Luke Muehlhauser for MIRI. He noted that there is reason not to trust expert opinions on AI timelines, and that trend extrapolation may be a good alternative. This point you're making about GDP seems like it is real progress towards coming up with a good way to do trend extrapolation, and thus seems worth a full post IMO. (Assuming it isn't already well known by the community or something, which I don't get the sense is the case.)

↑ comment by Raemon · 2021-10-01T20:07:04.408Z · LW(p) · GW(p)

Upvoted, but I mostly trust you to write the post if it seems like there's an interesting meaty thing worth saying.

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-10-01T20:30:04.043Z · LW(p) · GW(p)

Eh, these were the main takeaways, the post would just be more details and examples so people can see the gears behind it.

↑ comment by Mark Xu (mark-xu) · 2021-10-01T22:18:17.249Z · LW(p) · GW(p)

A similar point is made by Korinek in his review of Could Advanced AI Drive Explosive Economic Growth:

My first reaction to the framing of the paper is to ask: growth in what? It’s important to keep in mind that concepts like “gross domestic product” and “world gross domestic product” were defined from an explicit anthropocentric perspective - they measure the total production of final goods within a certain time period. Final goods are what is either consumed by humans (e.g. food or human services) or what is invested into “capital goods” that last for multiple periods (e.g. a server farm) to produce consumption goods for humans.

Now imagine you are a highly intelligent AI system running on the cloud. Although the production of the server farms on which you depend enters into human GDP (as a capital good), most of the things that you absorb, for example energy, server maintenance, etc., count as “intermediate goods” in our anthropocentric accounting systems and do not contribute to human GDP. In fact, to the extent that the AI system drives up the price of scarce resources (like energy) consumed by humans, real human GDP may even decline.

As a result, it is conceivable (and, to be honest, one of the central scenarios for me personally) that an AI take-off occurs but anthropocentric GDP measures show relative stagnation in the human economy.

To make this scenario a bit more tangible, consider the following analogy: imagine a world in which there are two islands trading with each other, but the inhabitants of the islands are very different from each other - let’s call them humans and AIs. The humans sell primitive goods like oil to the AIs and their level of technology is relatively stagnant. The AIs sell amazing services to the humans, and their level of technology doubles every year. However, the AI services that humans consume make up only a relatively small part of the human consumption basket. The humans are amazed at what fantastic services they get from the AIs in exchange for their oil, and they experience improvements in their standard of living from these fantastic AI services, although they also have to pay more and more for their energy use every year, which offsets part of that benefit. The humans can only see what’s happening on their own island and develop a measure of their own well-being that they call human GDP, which increases modestly because the advances only occur in a relatively small part of their consumption basket. The AIs can see what’s going on on the AI island and develop a measure of their own well-being which they call AI GDP, and which almost doubles every year. The system can go on like this indefinitely.

For a fuller discussion of these arguments, let me refer you to my working paper on “The Rise of Artificially Intelligent Agents” (with the caveat that the paper is still a working draft).

Replies from: mark-xu

↑ comment by Mark Xu (mark-xu) · 2021-10-01T22:20:53.996Z · LW(p) · GW(p)

In general, Baumol type effects (spending decreasing in sectors where productivity goes up), mean that we can have scenarios in which the economy is growing extremely fast on "objective" metrics like energy consumption, but GDP has stagnated because that energy is being spent on extremely marginal increases in goods being bought and sold.

comment by johnswentworth · 2021-07-27T18:55:32.511Z · LW(p) · GW(p)

[Epistemic status: highly speculative]

Smoke from California/Oregon wildfires reaching the East Coast opens up some interesting new legal/political possibilities. The smoke is way outside state borders, all the way on the other side of the country, so that puts the problem pretty squarely within federal jurisdiction. Either a federal agency could step in to force better forest management on the states, or a federal lawsuit could be brought for smoke-induced damages against California/Oregon. That would potentially make it a lot more difficult for local homeowners to block controlled burns.

comment by johnswentworth · 2020-03-01T23:37:46.814Z · LW(p) · GW(p)

Someone should write a book review of The Design of Everyday Things aimed at LW readers, so I have a canonical source to link to other than the book itself.

comment by johnswentworth · 2021-03-02T00:34:06.248Z · LW(p) · GW(p)

I had a shortform post pointing out the recent big jump in new businesses in the US, and Gwern replied [LW(p) · GW(p)]:

How sure are you that the composition is interesting? How many of these are just quick mask-makers or sanitizer-makers, or just replacing restaurants that have now gone out of business? (ie very low-value-added companies, of the 'making fast food in a stall in a Third World country' sort of 'startup', which make essentially no or negative long-term contributions).

This was a good question in context, but I disagree with Gwern's model of where-progress-comes-from, especially in the context of small businesses.

Let's talk ice-cream cones.

As the story goes, an ice-cream vendor was next door to a waffle vendor at the 1904 World's Fair. At some point, the ice-cream vendor ran short on paper cups, and inspiration struck. He bought some thin waffles from the waffle vendor, rolled them into cones, and ice-cream cones took off.

That's just the first step. From there, the cone spread memetically. People heard about it, and either asked for cones (on the consumer side) or tried making them (on the supplier side).

Insight + Memetics -> Better Food

When I compare food today to the stuff my grandparents ate, there's no comparison. Today's dishes are head and shoulders better. Partly it's insights like ice-cream cones, partly it's memetic spread of dishes from more parts of the world (like sisig, soup dumplings, ropa vieja, chicken Karahi, ...).

Those little fast-food stalls? They're powerhouses of progress. It's a hypercompetitive market, with low barriers to entry, and lots of repeat business. The conditions are ideal for trying out new dishes, spreading culinary ideas and finding out the hard way what people like to eat. That doesn't mean they're highly profitable - culinary innovation spreads memetically, so it's hard to capture the gains. But progress is made.

Replies from: ChristianKl

↑ comment by ChristianKl · 2021-03-02T20:39:49.875Z · LW(p) · GW(p)

The pandemic also has the effect of showing the kind of business ideas people try. It pushes a lot of innovation in food delivery. Some of the pandemic driver innovation will become worthless once the pandemic is over but a few good ideas likely survive and the old ideas of the businesses that went out of business are still around.

comment by johnswentworth · 2023-09-13T21:17:24.918Z · LW(p) · GW(p)

Does anyone know of an "algebra for Bayes nets/causal diagrams"?

More specifics: rather than using a Bayes net to define a distribution, I want to use a Bayes net to state a property which a distribution satisfies. For instance, a distribution P[X, Y, Z] satisfies the diagram X -> Y -> Z if-and-only-if the distribution factors according to
P[X, Y, Z] = P[X] P[Y|X] P[Z|Y].

When using diagrams that way, it's natural to state a few properties in terms of diagrams, and then derive some other diagrams they imply. For instance, if a distribution P[W, X, Y, Z] satisfies all of:

W -> Y -> Z
W -> X -> Y
X -> (W, Y) -> Z

... then it also satisfies W -> X -> Y -> Z.

What I'm looking for is a set of rules for "combining diagrams" this way, without needing to go back to the underlying factorizations in order to prove things.

David and I have been doing this sort of thing a lot in our work the past few months, and it would be nice if someone else already had a nice write-up of the rules for it.

comment by johnswentworth · 2022-05-25T01:27:30.914Z · LW(p) · GW(p)

Weather just barely hit 80°F today, so I tried the Air Conditioner Test [LW · GW].

Three problems came up:

Turns out my laser thermometer is all over the map. Readings would change by 10°F if I went outside and came back in. My old-school thermometer is much more stable (and well-calibrated, based on dipping it in some ice water), but slow and caps out around 90°F (so I can't use to measure e.g. exhaust temp). I plan to buy a bunch more old-school thermometers for the next try.
I thought opening the doors/windows in rooms other than the test room and setting up a fan would be enough to make the temperature in the hall outside the test room close to outdoor temp. This did not work; hall temp was around 72°F with outside around 80°F. I'll need to change that part of the experiment design; most likely I'll seal around the door and let air infiltrate exclusively from the window instead. (The AC is right next to the window, so this could screw with the results, but I don't really have a better option.)
In two-hose mode, the AC hit its minimum temperature of 60°F, so I'll need a hotter day. I'll try again when we hit at least 85°F.

In case anyone's wondering: in one-hose mode, the temperature in the room equilibrated around 66°F. Power consumption was near-constant throughout all conditions.

One additional Strange Observation: cool air was blowing out under the door of the test room in two-hose mode. This should not happen; my best guess is that, even though the AC has two separate intake vents, the two are not actually partitioned internally, so the fan for indoor-air was pulling in outdoor-air (causing air to blow out under the door to balance that extra inflow). Assuming that's the cause, it should be fixable with some strategically-placed cardboard inside the unit.

comment by johnswentworth · 2021-06-25T19:58:25.752Z · LW(p) · GW(p)

Chrome is offering to translate the LessWrong homepage for me. Apparently, it is in Greek.

Replies from: habryka4

↑ comment by habryka (habryka4) · 2021-06-25T20:00:30.969Z · LW(p) · GW(p)

Huh, amusing. We do ship a font that has nothing but the greek letter set in it, because people use greek unicode symbols all the time and our primary font doesn't support that character set. So my guess is that's where Google gets confused.

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-06-25T20:55:16.539Z · LW(p) · GW(p)

Oh, I had just assumed it was commentary on the writing style/content.

Replies from: Viliam

↑ comment by Viliam · 2021-06-25T21:11:25.274Z · LW(p) · GW(p)

If about 10% of articles have "Ω" in their title, what is the probability that the page is in Greek? :D

comment by johnswentworth · 2020-02-27T19:04:55.439Z · LW(p) · GW(p)

What if physics equations were written like statically-typed programming languages?

$(\frac{m a s s \cdot l e n g t h}{t i m e^{2}} : F) = (\frac{m a s s}{-} : m) (\frac{l e n g t h}{t i m e^{2}} : a)$

$(\frac{m a s s}{l e n g t h \cdot t i m e^{2}} : P) (\frac{l e n g t h^{3}}{-} : V) = (\frac{-}{-} : N) (\frac{m a s s \cdot l e n g t h^{2}}{t i m e^{2} \cdot t e m p} : R) (\frac{t e m p}{-} : T)$

Replies from: jimrandomh, steve2152

↑ comment by jimrandomh · 2020-02-27T19:54:25.639Z · LW(p) · GW(p)

The math and physics worlds still use single-letter variable names for everything, decades after the software world realized that was extremely bad practice. This makes me pessimistic about the adoption of better notation practices.

Replies from: johnswentworth

↑ comment by johnswentworth · 2020-02-27T23:10:11.722Z · LW(p) · GW(p)

Better? I doubt it. If physicists wrote equations the way programmers write code, a simple homework problem would easily fill ten pages.

Verboseness works for programmers because programmers rarely need to do anything more complicated with their code than run it - analogous to evaluating an expression, for a physicist or mathematician. Imagine if you needed to prove one program equivalent to another algebraically - i.e. a sequence of small transformations, with a record of intermediate programs derived along the way in order to show your work. I expect programmers subjected to such a use-case would quickly learn the virtues of brevity.

↑ comment by Steven Byrnes (steve2152) · 2020-02-27T19:26:49.729Z · LW(p) · GW(p)

Yeah, I'm apparently not intelligent enough to do error-free physics/engineering calculations without relying on dimensional analysis as a debugging tool. I even came up with a weird, hack-y way to do that in computing environments like Excel and Cython, where flexible multiplicative types are not supported.

comment by johnswentworth · 2025-04-10T18:08:56.377Z · LW(p) · GW(p)

Is interpersonal variation in anxiety levels mostly caused by dietary iron?

~~I stumbled across~~ ~~this paper~~ yesterday. I haven't looked at it very closely yet, but the high-level pitch is that they look at genetic predictors of iron deficiency and then cross that with anxiety data. It's interesting mainly because it sounds pretty legit (i.e. the language sounds like direct presentation of results without any bullshitting, the p-values are satisfyingly small, there's no branching paths), and the effect sizes are BIG IIUC:

The odd ratios (OR) of anxiety disorders per 1 standard deviation (SD) unit increment in iron status biomarkers were 0.922 (95% confidence interval (CI) 0.862–0.986; p = 0.018) for serum iron level, 0.873 (95% CI 0.790–0.964; p = 0.008) for log-transformed ferritin and 0.917 (95% CI 0.867–0.969; p = 0.002) for transferrin saturation. But no statical significance was found in the association of 1 SD unit increased total iron-binding capacity (TIBC) with anxiety disorders (OR 1.080; 95% CI 0.988–1.180; p = 0.091). The analyses were supported by pleiotropy test which suggested no pleiotropic bias.

Odds ratio of anxiety disorders changes by roughly 0.9 per standard deviation in iron level, across four different measures of iron level. (Note that TIBC, the last of the four iron level measures, didn't hit statistical significance but did have a similar effect size to the other three.)

~~Just eyeballing those effect sizes... man, it kinda sounds like iron levels are maybe~~ ~~the~~ ~~main game for most anxiety? Am I interpreting that right? Am I missing something here?~~

EDIT: I read more, and it turns out the wording of the part I quoted was misleading. The number 0.922, for instance, was the odds ratio AT +1 standard deviation serum iron level, not PER +1 standard deviation serum iron level. That would be -0.078 PER standard deviation serum iron level, so it's definitely not the "main game for most anxiety".

Replies from: xpostah

↑ comment by samuelshadrach (xpostah) · 2025-04-10T18:17:56.250Z · LW(p) · GW(p)

Have you tested this hypothesis on your friends? Ask them for their iron level from last blood test, and ask them to self-report anxiety level (you also make a separate estimate of their anxiety level).

comment by johnswentworth · 2024-02-16T16:56:49.431Z · LW(p) · GW(p)

I keep seeing news outlets and the like say that SORA generates photorealistic videos, can model how things move in the real world, etc. This seems like blatant horseshit? Every single example I've seen looks like video game animation, not real-world video.

Have I just not seen the right examples, or is the hype in fact decoupled somewhat from the model's outputs?

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2024-02-16T17:26:04.412Z · LW(p) · GW(p)

I think I mildly disagree, but probably we're looking at the same examples.

I think the most impressive (in terms of realism) videos are under "Sora is able to generate complex scenes with multiple characters, ...". (Includes white SUV video and Toyko suburbs video.)

I think all of these videos other than the octopus and paper planes are "at-a-glance" photorealistic to me.

Overall, I think SORA can do "at-a-glance" photorealistic videos and can model to some extent how things move in the real world. I don't think it can do both complex motion and photorealism in the same video. As in, the videos which are photorealistic don't really involve complex motion and the videos which involve complex motion aren't photorealistic.

(So probably some amount of hype, but also pretty real?)

Replies from: habryka4, johnswentworth

↑ comment by habryka (habryka4) · 2024-02-16T19:51:24.689Z · LW(p) · GW(p)

Hmm, I don't buy it. These two scenes seem very much not like the kind of thing a video game engine could produce:

Look at this frame! I think there is something very slightly off about that face, but the cat hitting the person's face and the person's reaction seem very realistic to me and IMO qualifies as "complex motion and photorealism in the same video".

Replies from: johnswentworth, ryan_greenblatt, RamblinDash

↑ comment by johnswentworth · 2024-02-17T00:33:26.491Z · LW(p) · GW(p)

Were these supposed to embed as videos? I just see stills, and don't know where they came from.

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2024-02-17T00:34:07.307Z · LW(p) · GW(p)

These are stills from some of the videos I was referencing.

↑ comment by ryan_greenblatt · 2024-02-16T22:07:55.926Z · LW(p) · GW(p)

TBC, I wasn't claiming anything about video game engines.

I wouldn't have called the cat one "complex motion", but I can see where you're coming from.

↑ comment by RamblinDash · 2024-02-16T20:08:30.209Z · LW(p) · GW(p)

Yeah, I mean I guess it depends on what you mean by photorealistic. That cat has three front legs.

Replies from: gwern

↑ comment by gwern · 2024-02-16T21:10:55.864Z · LW(p) · GW(p)

Yeah, this is the example I've been using to convince people that the game engines are almost certainly generating training data but are probably not involved at sampling time. I can't come up with any sort of hybrid architecture like 'NN controlling game-engine through API' where you get that third front leg. One of the biggest benefits of a game-engine would be ensuring exactly that wouldn't happen - body parts becoming detached and floating in mid-air and lack of conservation. If you had a game engine with a hyper-realistic cat body model in it which something external was manipulating, one of the biggest benefits is that you wouldn't have that sort of common-sense physics problem. (Meanwhile, it does look like past generative modeling of cats in its errors. Remember the ProGAN interpolation videos of CATS? Hilarious, but also an apt demonstration of how extremely hard cats are to model. They're worse than hands.)

In addition, you see plenty of classic NN tells throughout - note the people driving a 'Dandrover'...

↑ comment by johnswentworth · 2024-02-16T17:47:27.533Z · LW(p) · GW(p)

Yeah, those were exactly the two videos which most made me think that the model was mostly trained on video game animation. In the tokyo one, the woman's facial muscles never move at all, even when the camera zooms in on her. And in the SUV one, the dust cloud isn't realistic, but even covering that up the SUV has a Grand Theft Auto look to its motion.

"Can't do both complex motion and photorealism in the same video" is a good hypothesis to track, thanks for putting that one on my radar.

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2024-02-16T18:09:49.408Z · LW(p) · GW(p)

(Note that I was talking about the one with the train going through Toyko suburbs.)

comment by johnswentworth · 2023-08-01T04:11:37.894Z · LW(p) · GW(p)

Putting this here for posterity: I have thought since the superconductor preprint went up, and continue to think, that the markets are putting generally too little probability on the claims being basically-true. I thought ~70% after reading the preprint the day it went up (and bought up a market on manifold to ~60% based on that, though I soon regretted not waiting for a better price), and my probability has mostly been in the 40-70% range since then.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-08-01T05:32:15.424Z · LW(p) · GW(p)

After seeing the markets jump up in response to the latest, I think I'm more like 65-80%.

comment by johnswentworth · 2022-05-29T00:27:20.149Z · LW(p) · GW(p)

Languages should have tenses for spacelike separation. My friend and I do something in parallel, it's ambiguous/irrelevant which one comes first, I want to say something like "I expect my friend <spacelike version of will do/has done/is doing> their task in such-and-such a way".

Replies from: JBlack, adamShimi, kave

↑ comment by JBlack · 2022-05-29T01:24:48.646Z · LW(p) · GW(p)

That sounds more like a tenseless sentence than using a spacelike separation tense. Your friend's performance of the task may well be in your future or past lightcone (or extend through both), but you don't wish to imply any of these.

There are languages with tenseless verbs, as well as some with various types of spatial tense.

The closest I can approximate this in English without clumsy constructs is "I expect my friend does their task in such-and-such a way", which I agree isn't very satisfactory.

↑ comment by adamShimi · 2022-05-29T08:34:31.855Z · LW(p) · GW(p)

Who would have thought that someone would ever look at CSP and think "I want english to be more like that"?

Replies from: johnswentworth

↑ comment by johnswentworth · 2022-05-29T16:23:11.423Z · LW(p) · GW(p)

lol

↑ comment by kave · 2022-05-29T00:43:25.372Z · LW(p) · GW(p)

Future perfect (hey, that's the name of the show!) seems like a reasonable hack for this in English

comment by johnswentworth · 2021-10-28T18:56:15.332Z · LW(p) · GW(p)

Two kinds of cascading catastrophes one could imagine in software systems...

A codebase is such a spaghetti tower (and/or coding practices so bad) that fixing a bug introduces, on average, more than one new bug. Software engineers toil away fixing bugs, making the software steadily more buggy over time.
Software services managed by different groups have dependencies - A calls B, B calls C, etc. Eventually, the dependence graph becomes connected enough and loopy enough that a sufficiently-large chunk going down brings down most of the rest, and nothing can go back up until everything else goes back up (i.e. there's circular dependence/deadlock).

How could we measure how "close" we are to one of these scenarios going supercritical?

For the first, we'd need to have attribution of bugs - i.e. track which change introduced each bug. Assuming most bugs are found and attributed after some reasonable amount of time, we can then estimate how many bugs each bug fix introduces, on average.

(I could also imagine a similar technique for e.g. medicine: check how many new problems result from each treatment of a problem.)

For the second, we'd need visibility into codebases maintained by different groups, which would be easy within a company but much harder across companies. In principle, within a company, some kind of static analysis tool could go look for all the calls to apis between services, map out the whole graph, and then calculate which "core" pieces could be involved in a catastrophic failure.

(Note that this problem could be mostly-avoided by intentionally taking down services occasionally, so engineers are forced to build around that possibility. I don't think any analogue of this approach would work for the first failure-type, though.)

comment by johnswentworth · 2021-05-21T04:03:36.818Z · LW(p) · GW(p)

I wish there were a fund roughly like the Long-Term Future Fund, but with an explicit mission of accelerating intellectual progress.

Replies from: habryka4, quinn-dougherty

↑ comment by habryka (habryka4) · 2021-05-21T04:11:36.843Z · LW(p) · GW(p)

I mean, just to be clear, I am all in favor of intellectual progress. But doing so indiscriminately does sure seem a bit risky in this world of anthropogenic existential risks. Reminds me of my mixed feelings on the whole Progress Studies thing.

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-05-21T04:36:23.051Z · LW(p) · GW(p)

Yeah, I wouldn't want to accelerate e.g. black-box ML. I imagine the real utility of such a fund would be to experiment with ways to accelerate intellectual progress and gain understanding of the determinants, though the grant projects themselves would likely be more object-level than that. Ideally the grants would be in areas which are not themselves very risk-relevant, but complicated/poorly-understood enough to generate generalizable insights into progress.

I think it takes some pretty specific assumptions for such a thing to increase risk significantly on net. If we don't understand the determinants of intellectual progress, then we have very little ability to direct progress where we want it; it just follows whatever the local gradient is. With more understanding, at worst it follows the same gradient faster, and we end up in basically the same spot.

The one way it could net-increase risk is if the most likely path of intellectual progress leads to doom, and the best way to prevent doom is through some channel other than intellectual progress (like political action, for instance). Then accelerating the intellectual progress part potentially gives the other mechanisms (like political bodies) less time to react. Personally, though, I think a scenario in which e.g. political action successfully prevents intellectual progress from converging to doom (in a world where it otherwise would have) is vanishingly unlikely (like, less than one-in-a-hundred, maybe even less than one-in-a-thousand).

↑ comment by Quinn (quinn-dougherty) · 2021-05-23T11:23:18.820Z · LW(p) · GW(p)

You might check out Donald Braben's view, it says "transformative research" (i.e. fundamental results that create new fields and industries) is critical for the survival of civilization. He does not worry that transformative results might end civilization.

comment by johnswentworth · 2020-03-30T20:47:16.835Z · LW(p) · GW(p)

For short-term, individual cost/benefit calculations around C19, it seems like uncertainty in the number of people currently infected should drop out of the calculation.

For instance: suppose I'm thinking about the risk associated with talking to a random stranger, e.g. a cashier. My estimated chance of catching C19 from this encounter will be roughly proportional to $N_{i n f e c t e d}$ . But, assuming we already have reasonably good data on number hospitalized/died, my chances of hospitalization/death given infection will be roughly inversely proportional to $N_{i n f e c t e d}$ . So, multiplying those two together, I'll get a number roughly independent of $N_{i n f e c t e d}$ .

How general is this? Does some version of it apply to long-term scenarios too (possibly accounting for herd immunity)? What short-term decisions do depend on $N_{i n f e c t e d}$ ?

comment by johnswentworth · 2024-05-06T21:35:58.778Z · LW(p) · GW(p)

Way back in the halcyon days of 2005, a company called Cenqua had an April Fools' Day announcement for a product called Commentator: an AI tool which would comment your code (with, um, adjustable settings for usefulness). I'm wondering if (1) anybody can find an archived version of the page (the original seems to be gone), and (2) if there's now a clear market leader for that particular product niche, but for real.

Replies from: D0TheMath, AlfredHarwood

↑ comment by Garrett Baker (D0TheMath) · 2024-05-06T21:38:58.882Z · LW(p) · GW(p)

Archived website

Replies from: johnswentworth

↑ comment by johnswentworth · 2024-05-06T21:42:17.473Z · LW(p) · GW(p)

You are a scholar and a gentleman.

↑ comment by A.H. (AlfredHarwood) · 2024-05-06T21:41:55.626Z · LW(p) · GW(p)

Here is an archived version of the page :

http://web.archive.org/web/20050403015136/http://www.cenqua.com/commentator/

comment by johnswentworth · 2022-10-24T20:51:36.059Z · LW(p) · GW(p)

Here's an interesting problem of embedded agency/True Names which I think would make a good practice problem: formulate what it means to "acquire" something (in the sense of "acquiring resources"), in an embedded/reductive sense. In other words, you should be able-in-principle to take some low-level world-model, and a pointer to some agenty subsystem in that world-model, and point to which things that subsystem "acquires" and when.

Some prototypical examples which an answer should be able to handle well:

Organisms (anything from bacteria to plant to animals) eating things, absorbing nutrients, etc.
Humans making money or gaining property.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-10-28T12:57:08.421Z · LW(p) · GW(p)

...and how the brain figures this out and why it is motivated to do so. There are a lot of simple animals that apparently "try to control" resources or territory. How?

Drives to control resources occur everywhere. And your control of resources is closely related to your dominance in a dominance hierarchy. Which seems to be regulated in many animals by serotonin. See e.g. https://www.nature.com/articles/s41386-022-01378-2

comment by johnswentworth · 2023-04-18T20:31:44.947Z · LW(p) · GW(p)

An interesting conundrum: one of the main challenges of designing useful regulation for AI is that we don't have any cheap and robust way to distinguish a dangerous neural net from a non-dangerous net (or, more generally, a dangerous program from a non-dangerous program). This is an area where technical research could, in principle, help a lot.

The problem is, if there were some robust metric for how dangerous a net is, and that metric were widely known and recognized (as it would probably need to be in order to be used for regulatory purposes), then someone would probably train a net to maximize that metric directly.

Replies from: D0TheMath, Thane Ruthenis, Thane Ruthenis

↑ comment by Garrett Baker (D0TheMath) · 2023-04-18T23:00:02.828Z · LW(p) · GW(p)

This seems to lead to the solution of trying to make your metric one-way, in the sense that your metric should

Provide an upper-bound on the dangerousness of your network
Compress the space of networks which map to approximately the same dangerousness level on the low end of dangerousness, and expand the space of networks which map to approximately the same dangerousness level on the upper end of dangerous, so that you can train your network to minimize the metric, but when you train your network to maximize the metric you end up in a degenerate are with technically very high measured danger levels but in actuality very low levels of dangerousness.

We can hope (or possibly prove) that as you optimize upwards on the metric you get subject to goodheart's curse, but the opposite occurs on the lower end.

↑ comment by Thane Ruthenis · 2023-04-18T20:56:03.697Z · LW(p) · GW(p)

Sure, even seems a bit tautological: any such metric, to be robust, would need to contain in itself a definition of a dangerously-capable AI, so you probably wouldn't even need to train a model to maximize it. You'd be able to just lift the design from the metric directly.

↑ comment by Thane Ruthenis · 2023-05-14T01:47:37.720Z · LW(p) · GW(p)

Do you have any thoughts on a softer version of this problem, where the metric can't be maximized directly, but gives a concrete idea of what sort of challenge your AI needs to beat to qualify as AGI? (And therefore in which direction in the architectural-design-space you should be moving.)

Some variation on this [LW(p) · GW(p)] seems like it might work as a "fire alarm" test set, but as you point out, inasmuch as it's recognized, it'll be misapplied for benchmarking instead.

(I suppose the ideal way to do it would be to hand it off to e. g. ARC, so they can use it if OpenAI invites them for safety-testing again. This way, SOTA models still get tested, but the actors who might misuse it aren't aware of the testing's particulars until they succeed anyway...)

comment by johnswentworth · 2021-09-28T04:46:02.895Z · LW(p) · GW(p)

I just went looking for a good reference for the Kelly criterion, and didn't find any on Lesswrong. So, for anybody who's looking: chapter 6 of Thomas & Cover's textbook on information theory is the best source I currently know of.

Replies from: Yoav Ravid

↑ comment by Yoav Ravid · 2021-09-28T04:54:05.401Z · LW(p) · GW(p)

Might be a good thing to add to the Kelly Criterion tag [? · GW]

comment by johnswentworth · 2020-10-26T23:33:28.844Z · LW(p) · GW(p)

Neat problem of the week: we have n discrete random variables, . Given any variable, all variables are independent:

$\forall i : P [X | X_{i}] = \prod_{j} P [X_{j} | X_{i}]$

Characterize the distributions which satisfy this requirement.

This problem came up while working on the theorem in this post [LW · GW], and (separately) in the ideas behind this post [? · GW]. Note that those posts may contain some spoilers for the problem, though frankly my own proofs on this one just aren't very good.

comment by johnswentworth · 2022-04-01T17:56:41.368Z · LW(p) · GW(p)

johnswentworth's Shortform

Contents

488 comments

... But It's Fake Tho

NVIDIA Is A Terrible AI Bet

MI300x vs H100

MI300x vs H200

Background: Thyroid & Cortisol Systems

Generalization

Some Promising Leads, Some Dead Ends

A Different Gambit For Genetically Engineering Smarter Humans?

The Baseline

What's Missing?

Guess: Copy Count Variation of Microsats/Minisats/Transposons

What Alternative Strategies Would This Hypothesis Suggest?

Takeaways From "The Idea Factory: Bell Labs And The Great Age Of American Innovation"

Proof Sketch

Why Is This Interesting?

Is interpersonal variation in anxiety levels mostly caused by dietary iron?