l-rudolf-l

The scenario does not say that AI progress slows down. What I imagined to be happening is that after 2028 or so, there is AI research being done by AIs at unprecedented speeds, and this drives raw intelligence forward more and more, but (1) the AIs still need to run expensive experiments to make progress sometimes, and (2) basically nothing is bottlenecked by raw intelligence anymore so you don't really notice it getting even better.

Comment by L Rudolf L (LRudL) on A History of the Future, 2025-2040 · 2025-02-21T20:17:46.433Z · LW · GW

I will admit I'm not an expert here. The intuition behind this is that if you grant extreme performance at mathsy things very soon, it doesn't seem unreasonable that the AIs will make some radical breakthrough in the hard sciences surprisingly soon, while still being bad at many other things. In the scenario, note that it's a "mathematical framework" (implicitly a sufficiently big advance in what we currently have such that it wins a Nobel) but not the final theory of everything, and it's explicitly mentioned empirical data bottlenecks it.

Comment by L Rudolf L (LRudL) on A History of the Future, 2025-2040 · 2025-02-18T21:35:56.890Z · LW · GW

Thanks for these speculations on the longer-term future!

while I do think Mars will be exploited eventually, I expect the moon to be first for serious robotics effort

Maybe! My vague Claude-given sense is that the Moon is surprisingly poor in important elements though.

not being the fastest amongst them all (because replicating a little better will usually only get a little advantage, not an utterly dominant one), combined with a lot of values being compatible with replicating fast, so value alignment/intent alignment matters more than you think

This is a good point! However, more intelligence in the world also means we should expect competition to be tighter, reducing the amount of slack by which you can deviate from the optimal. In general, I can see plausible abstract arguments for the long-run equilibrium being either Hansonian zero-slack Malthusian competition or absolute unalterable lock-in.

Given no nationalization of the companies has happened, and they still have large freedoms of action, it's likely that Google Deepmind, OpenAI and Anthropic have essentially supplanted the US as the legitimate government, given their monopolies on violence via robots.

I expect the US government to be competent enough to avoid being supplanted by the companies. I think politicians, for all their flaws, are pretty good at recognising a serious threat to their power. There's also only one government but several competing labs.

(Note that the scenario doesn't mention companies in the mid and late 2030s)

the fact that EA types got hired to some of the most critical positions on AI was probably fairly critical in this timeline for preventing the worst outcomes from the intelligence curse from occurring.

In this timeline, a far more important thing is the sense among American political elite that they are freedom-loving people and that they should act in accordance with that, and a similar sense among Chinese political elite that they are a civilised people and that Chinese civilisational continuity is important. A few EAs in government, while good, will find it difficult to match the impact of the cultural norms that a country's leaders inherit and that proscribe their actions.

For example: I've been reading Christopher Brown's Moral Capital recently, which looks at how opposition to slavery rose to political prominence in 1700s Britain. It claims that early strong anti-slavery attitudes were more driven by a sense that slavery was insulting to Britons' sense of themselves as a uniquely liberal people, than by arguments about slave welfare. At least in that example, the major constraint on the treatment of a powerless group of people seems to have been in large part the political elite managing its own self-image.

Comment by L Rudolf L (LRudL) on William_S's Shortform · 2025-02-17T11:51:29.297Z · LW · GW

I built this a few months ago: https://github.com/LRudL/devcon

Definitely not production-ready and might require some "minimal configuration and tweaking" to get working.

Includes a "device constitution" that you set; if you visit a website, Claude will judge whether the page follows that written document, and if not it will block you, and the only way past it is winning a debate with it about why your website visit is in-line with your device constitution.

I found it too annoying but some of my friends liked it.

Comment by L Rudolf L (LRudL) on AI Safety as a YC Startup · 2025-01-08T12:25:38.022Z · LW · GW

However, I think there is a group of people who over-optimize for Direction and neglect the Magnitude. Increasing Magnitude often comes with the risk of corrupting the Direction. For example, scaling fast often makes it difficult to hire only mission-aligned people, and it requires you to give voting power to investors that prioritizes profit. To increase Magnitude can therefore feel risky, what if I end up working at something that is net-negative for the world? Therefore it might be easier for one's personal sanity to optimize for Direction, to do something that is unquestionably net-positive. But this is the easy way out, and if you want to have the highest expected value of your Impact, you cannot disregard Magnitude.

You talk here about an impact/direction v ambition/profit tradeoff. I've heard many other people talking about this tradeoff too. I think it's overrated; in particular, if you're constantly having to think about it, that's a bad sign.

It's rare that you have a continuous space of options between lots of impact and low profit, and low/negative impact and high profit.
If you do have such a continuous space of options then I think you are often just screwed and profit incentives will win.
The really important decision you make is probably a discrete choice: do you start an org trying to do X, or an org trying to do Y? Usually you can't (and even if you can, shouldn't) try to interpolate between these things, and making this high-level strategy call will probably shape your impact more than any later finetuning of parameters within that strategy.
Often, the profit incentives point towards the more-obvious, gradient-descent-like path, which is usually very crowded and leads to many "mediocre" outcomes (e.g. starting a $10M company), but the biggest things come from doing "Something Else Which Is Not That" (as is said in dath ilan). For example, SpaceX (ridiculously hard and untested business proposition) and Facebook (started out seeing very small and niche and with no clue of where the profit was).

Instead, I think the real value of doing things that are startup-like comes from:

The zero-to-one part of Peter Thiel's zero-to-one v one-to-n framework: the hardest, progress-bottlenecking things usually look like creating new things, rather than scaling existing things. For example, there is very little you can do today in American politics that is as impactful or reaches as deep into the future as founding America in the first place.
In the case of AI safety: neglectedness. Everyone wants to work at a lab instead, humans are too risk averse in general, etc. (I've heard many people in AI safety say that neglectedness is overrated. There are arguments like this one that replaceability/neglectedness considerations aren't that major: job performance is heavy-tailed, hiring is hard for orgs, etc. But such arguments seem like weirdly myopic parameter-fiddling, at least when the alternative is zero-to-one things like discussed above. Starting big things is in fact big. Paradigm shifts matter because they're the frame that everything else takes place in. You either see this or you don't.)
To the extent you think the problem is about economic incentives or differential progress, have you considered getting your hands dirty and trying to change the actual economy or the direction of the tech tree? There are many ways to do this, including some types of policy and research. But I think the AI safety scene has a cultural bias towards things that look like research or information-gathering, and away from being "builders" in the Silicon Valley sense. One of the things that Silicon Valey does get right is that being a builder is very powerful. If the AI debate comes down to a culture/influence struggle between anti-steering, e/acc-influenced builder types and pro-steering EA-influenced academic types, it doesn't look good for the world.

Comment by L Rudolf L (LRudL) on Review: Planecrash · 2025-01-07T10:53:03.329Z · LW · GW

Thanks for the heads-up, that looks very convenient. I've updated the post to link to this instead of the scraper repo on GitHub.

Comment by L Rudolf L (LRudL) on quila's Shortform · 2025-01-06T16:43:40.787Z · LW · GW

As far as I know, my post started the recent trend you complain about.

Several commenters on this thread (e.g. @Lucius Bushnaq here and @MondSemmel here) mention LessWrong's growth and the resulting influx of uninformed new users as the likely cause. Any such new users may benefit from reading my recently-curated review of Planecrash, the bulk of which is about summarising Yudkowsky's worldview.

i continue to feel so confused at what continuity led to some users of this forum asking questions like, "what effect will superintelligence have on the economy?" or otherwise expecting an economic ecosystem of superintelligences

If there's decision-making about scarce resources, you will have an economy. Even superintelligence does not necessarily imply infinite abundance of everything, starting with the reason that our universe only has so many atoms. Multipolar outcomes seem plausible under continuous takeoff, which the consensus view in AI safety (as I understand it) sees as more likely than fast takeoff. I admit that there are strong reasons for thinking that the aggregate of a bunch of sufficiently smart things is agentic, but this isn't directly relevant for the concerns about humans within the system in my post.

a value-aligned superintelligence directly creates utopia

In his review of Peter Singer's commentary on Marx, Scott Alexander writes:

[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He would come out and say it was irresponsible to talk about how communist governments and economies will work. He believed it was a scientific law, analogous to the laws of physics, that once capitalism was removed, a perfect communist government would form of its own accord. There might be some very light planning, a couple of discussions, but these would just be epiphenomena of the governing historical laws working themselves out.

Peter Thiel might call this "indefinite optimism": delay all planning or visualisation because there's some later point where it's trusted things will all sort themselves out. Now, if you think that takeoff will definitely be extremely hard and the resulting superintelligence will effortlessly take over the world, then obviously it makes sense to focus on what that superintelligence will want to do. But what if takeoff lasts months or years or decades? (Note that there can be lots of change even within months if the stakes look extreme to powerful actors!) Aren't you curious about what an aligned superintelligence will end up deciding about society and humans? Are you so sure about the transition period being so short and the superintelligence being so unitary and multipolar outcomes being so unlikely that we'll never have to worry about problems downstream of the incentive issues and competitive pressures that I discuss (which Beren recently had an excellent post on)? Are you so sure that there is not a single interesting, a priori deducible fact about the superintelligent economy beyond "a singleton is in charge and everything is utopia"?

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2025-01-03T22:26:47.360Z · LW · GW

The bottlenecks to compute production are constructing chip fabs; electricity; the availability of rare earth minerals.

Chip fabs and electricity generation are capital!

Right now, both companies have an interest in a growing population with growing wealth and are on the same side. If the population and its buying power begins to shrink, they will be in an existential fight over the remainder, yielding AI-insider/AI-outsider division.

Yep, AI buying power winning over human buying power in setting the direction of the economy is an important dynamic that I'm thinking about.

I also think the AI labor replacement is initially on the side of equality. [...] Now, any single person who is a competent user of Claude can feasibly match the output of any traditional legal team, [...]. The exclusive access to this labor is fundamental to the power imbalance of wealth inequality, so its replacement is an equalizing force.

Yep, this is an important point, and a big positive effect of AI! I write about this here. We shouldn't lose track of all the positive effects.

Comment by L Rudolf L (LRudL) on Why I'm Moving from Mechanistic to Prosaic Interpretability · 2024-12-31T12:05:58.217Z · LW · GW

Great post! I'm also a big (though biased) fan of Owain's research agenda, and share your concerns with mech interp.

I'm therefore coining the term "prosaic interpretability" - an approach to understanding model internals [...]
Concretely, I've been really impressed by work like Owain Evans' research on the Reversal Curse, Two-Hop Curse, and Connecting the Dots^[3]. These feel like they're telling us something real, general, and fundamental about how language models think. Despite being primarily empirical, such work is well-formulated conceptually, and yields gearsy mental models of neural nets, independently of existing paradigms.
[emphasis added]

I don't understand how the papers mentioned are about understanding model internals, and as a result I find the term "prosaic interpretability" confusing.

Some points that are relevant in my thinking (stealing a digram from an unpublished draft of mine):

the only thing we fundamentally care about with LLMs is the input-output behaviour (I-O)
now often, a good way to study the I-O map is to first understand the internals M
but if understanding the internals M is hard but you can make useful generalising statements about the I-O, then you might as well skip dealing with M at all (c.f. psychology, lots of econ, LLM papers like this)
the Owain papers you mention seem to me to make 3 distinct types of moves, in this taxonomy:
- finding some useful generalising statement about the I-O map behaviour (potentially conditional on some property of the training data) (e.g. the reversal curse)
- creating a modified model M' from M via fine-tuning on same data (but again, not caring about what the data actually does to the internals)
- (far less centrally than the above!) speculating about what the internal structure that causes the behavioural patterns above might be (e.g. that maybe models trained on "A=B" learn to map representation(A) --> representation(B) in some MLP, instead of learning the general rule that A and B are the same thing and representing them internally as such)

So overall, I don't think the type of work you mention is really focused on internals or interpretability at all, except incidentally in minor ways. (There's perhaps a similar vibe difference here to category theory v set theory: the focus being relations between (black-boxed) objects, versus the focus being the internals/contents of objects, with relations and operations defined by what they do to those internals)

I think thinking about internals can be useful—see here for a Neel Nanda tweet arguing the reversal curse if obvious if you understand mech interp—but also the blackbox research often has a different conceptual frame, and is often powerful specifically when it can skip all theorising about internals while still bringing true generalising statements about models to the table.

And therefore I'd suggest a different name than "prosaic interpretability". "LLM behavioural science"? "Science of evals"? "Model psychology"? (Though I don't particularly like any of these terms)

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-31T10:12:12.710Z · LW · GW

If takeoff is more continuous than hard, why is it so obvious that there exists exactly one superintelligence rather than multiple? Or are you assuming hard takeoff?

Also, your post writes about "labor-replacing AGI" but writes as if the world it might cause near-term lasts eternally

If things go well, human individuals continue existing (and humans continue making new humans, whether digitally or not). Also, it seems more likely than not that fairly strong property rights continue (if property rights aren't strong, and humans aren't augmented to be competitive with the superintelligences, then prospects for human survival seem weak since humans' main advantage is that they start out owning a lot of the stuff—and yes, that they can shape the values of the AGI, but I tentatively think CEV-type solutions are neither plausible nor necessarily desirable). The simplest scenario is that there is continuity between current and post-singularity property ownership (especially if takeoff is slow and there isn't a clear "reset" point). The AI stuff might get crazy and the world might change a lot as a result, but these guesses, if correct, seem to pin down a lot of what the human situation looks like.

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-30T23:45:13.218Z · LW · GW

I already added this to the start of the post:

Edited to add: The main takeaway of this post is meant to be: Labour-replacing AI will shift the relative importance of human v non-human factors of production, which reduces the incentives for society to care about humans while making existing powers more effective and entrenched. Many people are reading this post in a way where either (a) "capital" means just "money" (rather than also including physical capital like factories and data centres), or (b) the main concern is human-human inequality (rather than broader societal concerns about humanity's collective position, the potential for social change, and human agency).

However:

perhaps you should clarify that you aren't trying to argue that saving money to spend after AGI is a good strategy, you agree it's a bad strategy

I think my take is a bit more nuanced:

in my post, I explicitly disagree with focusing purely on getting money now, and especially oppose abandoning more neglected ways of impacting AI development in favour of ways that also optimise for personal capital accumulation (see the start of the takeaways section)
- the reason is that I think now is a uniquely "liquid" / high-leverage time to shape the world through hard work, especially because the world might soon get much more locked-in and because current AI makes it easier to do things
  - (also, I think modern culture is way too risk averse in general, and worry many people will do motivated reasoning and end up thinking they should accept the quant finance / top lab pay package for fancy AI reasons, when their actual reason is that they just want that security and status for prosaic reasons, and the world would benefit most from them actually daring to work on some neglected impactful thing)
however, it's also true that money is a very fungible resource, and we're heading into very uncertain times where the value of labour (most people's current biggest investment) looks likely to plummet
- if I had to give advice to people who aren't working on influencing AI for the better, I'd focus on generically "moving upwind" in terms of fungible resources: connections, money, skills, etc. If I had to pick one to advise a bystander to optimise for, I'd put social connections above money—robust in more scenarios (e.g. in very politicky worlds where money alone doesn't help), has deep value in any world where humans survive, in post-labour futures even more likely to be a future nexus of status competition, and more life-affirming and happiness-boosting in the meantime

This is despite agreeing with the takes in your earlier comment. My exact views in more detail (comments/summaries in square brackets):

The post-AGI economy might not involve money, it might be more of a command economy. [yep, this is plausible, but as I write here, I'm guessing my odds on this are lower than yours—I think a command economy with a singleton is plausible but not the median outcome]
Even if it involves money, the relationship between how much money someone has before and how much money they have after might not be anywhere close to 1:1. For example: [loss of control, non-monetary power, destructive war] [yep, the capital strategy is not risk-free, but this only really applies re selfish concerns if there are better ways to prepare for post-AGI; c.f. my point about social connections above]
Even if saving money through AGI converts 1:1 into money after the singularity, it will probably be worth less in utility to you
[because even low levels of wealth will max out personal utility post-AGI] [seems likely true, modulo some uncertainty about: (a) utility from positional goods v absolute goods v striving, and (b) whether "everyone gets UBI"-esque stuff is stable/likely, or fails due to despotism / competitive incentives / whatever]
[because for altruistic goals the leverage from influencing AI now is probably greater than leverage of competing against everyone else's saved capital after AGI] [complicated, but I think this is very correct at least for individuals and most orgs]

Regarding:

you are taking it to mean "we'll all be living in egalitarian utopia after AGI" or something like that

I think there's a decent chance we'll live in a material-quality-of-life-utopia after AGI, assuming "Things Go Fine" (i.e. misalignment / war / going-out-with-a-whimper don't happen). I think it's unlikely to be egalitarian in the "everyone has the same opportunities and resources", for the reasons I lay out above. There are lots of valid arguments for why, if Things Go Fine, it will still be much better than today despite that inequality, and the inequality might not practically matter very much because consumption gets maxed out etc. To be clear, I am very against cancelling the transhumanist utopia because some people will be able to buy 30 planets rather than just a continent. But there are some structural things that make me worried about stagnation, culture, and human relevance in such worlds.

In particular, I'd be curious to hear your takes about the section on state incentives after labour-replacing AI, which I don't think you've addressed and which I think is fairly cruxy to why I'm less optimistic than you about things going well for most humans even given massive abundance and tech.

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-30T22:56:34.571Z · LW · GW

For example:

Currently big companies struggle to hire and correctly promote talent for the reasons discussed in my post, whereas AI talent will be easier to find/hire/replicate given only capital & legible info
To the extent that AI ability scales with resources (potentially boosted by inference-time compute, and if SOTA models are no longer available to the public), then better-resourced actors have better galaxy brains
Superhuman intelligence and organisational ability in AIs will mean less bureaucratic rot and communication bandwidth problems in large orgs, compared to orgs made out of human brain -sized chunks, reducing the costs of scale

Imagine for example the world where software engineering is incredibly cheap. You can start a software company very easily, yes, but Google can monitor the web for any company that makes revenue off of software, instantly clone the functionality (because software engineering is just a turn-the-crank-on-the-LLM thing now) and combine it with their platform advantage and existing products and distribution channels. Whereas right now, it would cost Google a lot of precious human time and focus to try to even monitor all the developing startups, let alone launch a competing product for each one. Of course, it might be that Google itself is too bureaucratic and slow to ever do this, but someone else will then take this strategy.

C.f. the oft-quoted thing about how the startup challenge is getting to distribution before the incumbents get to distribution. But if the innovation is engineering, and the engineering is trivial, how do you get time to get distribution right?

(Interestingly, as I'm describing it above the most key thing is not so much capital intensivity, and more just that innovation/engineering is no longer a source of differential advantage because everyone can do it with their AIs really well)

There's definitely a chance that there's some "crack" in this, either from the economics or the nature of AI performance or some interaction. In particular, as I mentioned at the end, I don't think modelling the AI as an approaching blank wall of complete perfect intelligence all-obsoleting intelligence is the right model for short to medium -term dynamics. Would be very curious if you have thoughts.

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-30T22:36:30.878Z · LW · GW

Note, firstly, that money will continue being a thing, at least unless we have one single AI system doing all economic planning. Prices are largely about communicating information. If there are many actors and they trade with each other, the strong assumption should be that there are prices (even if humans do not see them or interact with them). Remember too that however sharp the singularity, abundance will still be finite, and must therefore be allocated.

Though yes, I agree that a superintelligent singleton controlling a command economy means this breaks down.

However it seems far from clear we will end up exactly there. The finiteness of the future lightcone and the resulting necessity of allocating "scarce" resources, the usefulness of a single medium of exchange (which you can see as motivated by coherence theorems if you want), and trade between different entities all seem like very general concepts. So even in futures that are otherwise very alien, but just not in the exact "singleton-run command economy" direction, I expect a high chance that those concepts matter.

Comment by L Rudolf L (LRudL) on Book Summary: Zero to One · 2024-12-29T19:02:47.103Z · LW · GW

Zero to One is a book that everyone knows about, but somehow it's still underrated.

Indefinite v definite in particular is a frame that's stuck with me.

Indefinite:

finance
consulting
"moving upwind" / "keeping your options open"
investing in diversified index funds
theories of impact based on measuring, forecasting, and understanding

Definite:

entrepreneurship
engineering
a clear strategy
bubbles
the Apollo Program
commitment
theories of impact based on specific solution plans and leverage points

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-29T15:33:40.343Z · LW · GW

I think I agree with all of this.

(Except maybe I'd emphasise the command economy possibility slightly less. And compared to what I understand of your ranking, I'd rank competition between different AGIs/AGI-using factions as a relatively more important factor in determining what happens, and values put into AGIs as a relatively less important factor. I think these are both downstream of you expecting slightly-to-somewhat more singleton-like scenarios than I do?)

EDIT: see here for more detail on my take on Daniel's takes.

Overall, I'd emphasize as the main point in my post: AI-caused shifts in the incentives/leverage of human v non-human factors of production, and this mattering because the interests of power will become less aligned with humans while simultaneously power becomes more entrenched and effective. I'm not really interested in whether someone should save or not for AGI. I think starting off with "money won't matter post-AGI" was probably a confusing and misleading move on my part.

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-29T15:22:09.488Z · LW · GW

the strategy of saving money in order to spend it after AGI is a bad strategy.

This seems very reasonable and likely correct (though not obvious) to me. I especially like your point about there being lots of competition in the "save it" strategy because it happens by default. Also note that my post explicitly encourages individuals to do ambitious things pre-AGI, rather than focus on safe capital accumulation.

Comment by L Rudolf L (LRudL) on Review: Planecrash · 2024-12-29T15:13:20.767Z · LW · GW

Shapley value is not that kind of Solution. Coherent agents can have notions of fairness outside of these constraints. You can only prove that for a specific set of (mostly natural) constraints, Shapeley value is the only solution. But there’s no dutchbooking for notions of fairness.

I was talking more about "dumb" in the sense of violates the "common-sense" axioms that were earlier established (in this case including order invariance by assumption), not "dumb" in the dutchbookable sense, but I think elsewhere I use "dumb" as a stand-in for dutchbookable so fair point.

See https://www.lesswrong.com/posts/TXbFFYpNWDmEmHevp/how-to-give-in-to-threats-without-incentivizing-them for the algorithm when the payoff is known.

Looks interesting, haven't had a chance to dig into yet though!

Something that I feel is missing from this review is the amount of intuitions about how minds work and optimization that are dumped at the reader. There are multiple levels at which much of what’s happening to the characters is entirely about AI. Fiction allows to communicate models; and many readers successfully get an intuition for corrigibility before they read the corrigibility tag, or grok why optimizing for nice readable thoughts optimizes against interpretability.
I think an important part of planecrash isn’t in its lectures but in It’s story and the experiences of its characters. While Yudkowsky jokes about LeCun refusing to read it, it is actually arguably one of the most comprehensive ways to learn about decision theory, with many of the lessons taught through experiences of characters and not through lectures.

Yeah I think this is very true, and I agree it's a good way to communicate your worldview.

I do think there are some ways in which the worlds Yudkowksy writes about are ones where his worldview wins. The Planecrash god setup, for example, is quite fine-tuned to make FDT and corrigibility important. This is almost tautological, since as a writer you can hardly do anything else than write the world as you think it works. But it still means that "works in this fictional world" doesn't transfer as much to "works in the real world", even when the fictional stuff is very coherent and well-argued.

Comment by L Rudolf L (LRudL) on Review: Planecrash · 2024-12-29T15:05:24.062Z · LW · GW

It doesn't contain anything I would consider a spoiler.

If you're extra scrupulous, the closest things are:

A description of a bunch of stuff that happens very early on to set up the plot
One revelation about the character development arc of a non-major character
A high-level overview of technical topics covered, and commentary on the general Yudkowskian position on them (with links to precise Planecrash parts covering them), but not spoiling any puzzles or anything that's surprising if you've read a lot of other Yudkowsky
A bunch of long quotes about dath ilani governance structures (but these are not plot relevant to Planecrash at all)
A few verbatim quotes from characters, which I guess would technically let you infer the characters don't die until they've said those words?

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-28T22:26:44.084Z · LW · GW

Important other types of capital, as the term is used here, include:

the physical nuclear power plants
the physical nuts and bolts
data centres
military robots

Capital is not just money!

Why would an AI want to transfer resources to someone just because they have some fiat currency?

Because humans and other AIs will accept fiat currency as an input and give you valuable things as an output.

Surely they have some better way of coordinating exchanges.

All the infra for fiat currency exists; I don't see why the AIs would need to reinvent that, unless they're hiding from human government oversight or breaking some capacity constraint in the financial system, in which case they can just use crypto instead.

It's possible that it instead ends up belonging to whatever AI has the best military robots.

Military robots are yet another type of capital! Note that if it were human soldiers, there would be much more human leverage in the situation, because at least some humans would need to agree to do the soldering, and presumably would get benefits for doing so, and would use the power and leverage they accrue from doing so to push broadly human goals.

The recruitment company is toast.

Or then the recruitment company pivots to using human labour to improve AI, as actually happened with the hottest recent recruiting company! If AI is the best investment, then humans and AIs alike will spend their efforts on AI, and the economy will gradually cater more and more to AI needs over human needs. See Andrew Critch's post here, for example. Or my story here.

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-28T22:13:58.020Z · LW · GW

To be somewhat more fair, the worry here is that in a regime where you don't need society anymore because AIs can do all the work for your society, value conflicts become a bigger deal than today, because there is less reason to tolerate other people's values if you can just found your own society based on your own values, and if you believe in the vulnerable world hypothesis, as a lot of rationalists do, then conflict has existential stakes, and even if not, can be quite bad, so one group controlling the future is better than inevitable conflict.

So to summarise: if we have a multipolar world, and the vulnerable world hypothesis if true, then conflict can be existentially bad and this is a reason to avoid a multipolar world. Didn't consider this, interesting point!

At a foundational level, whether or not our current tolerance for differing values is stable ultimately comes down to we can compensate for the effect of AGI allowing people to make their own society.

Considerations:

offense/defense balance (if offense wins very hard, it's harder to let everyone do their own thing)
tunability-of-AGI-power / implementability of the harm principle (if you can give everyone AGI that can follow very well the rule "don't let these people harm other people", then you can give that AGI safely to everyone and they can build planets however they like but not death ray anyone else's planets)

The latter might be more of a "singleton that allows playgrounds" rather an actual multipolar world though.

Some of my general worries with singleton worlds are:

humanity has all its eggs in one basket—you better hope the governance structure is never corrupted, or never becomes sclerotic; real-life institutions so far have not given me many signs of hope on this count
cultural evolution is a pretty big part of how human societies seem to have improved and relies on a population of cultures / polities
vague instincts towards diversity being good and less fragile than homogeneity or centralisation

Comment is also on substack:

Thanks!

Comment by L Rudolf L (LRudL) on By default, capital will matter more than ever after AGI · 2024-12-28T22:02:57.981Z · LW · GW

This post seems to misunderstand what it is responding to

fwiw, I see this post less as "responding" to something, and more laying out considerations on their own with some contrasting takes as a foil.

(On Substack, the title is "Capital, AGI, and human ambition", which is perhaps better)

that material needs will likely be met (and selfish non-positional preferences mostly satisfied) due to extreme abundance (if humans retain control).

I agree with this, though I'd add: "if humans retain control" and some sufficient combination of culture/economics/politics/incentives continues opposing arbitrary despotism.

I also think that even if all material needs are met, avoiding social stasis and lock-in matters.

Scope sensitive preferences

Scope sensitivity of preferences is a key concept that matters here, thanks for pointing that out.

Various other considerations about types of preferences / things you can care about (presented without endorsement):

instrumental preference to avoid stasis because of a belief it leads to other bad things (e.g. stagnant intellectual / moral / political / cultural progress, increasing autocracy)
- altruistic preferences combined with a fear that less altruism will result if today's wealth hierarchy is locked in, than if social progress and disruption continued
- a belief that it's culturally good when human competition has some anchoring to object-level physical reality (c.f. the links here)
- a general belief in a tendency for things to go off the rails without a ground-truth unbeatable feedback signal that the higher-level process needs to be wary of—see Gwern's Evolution as a backstop for RL
preferences that become more scope-sensitive due to transhumanist cognitive enhancement
positional preferences, i.e. wanting to be higher-status or more something than some other human(s)
a meta-positional-preference that positions are not locked in, because competition is fun
a preference for future generations having at least as much of a chance to shape the world, themselves, and their position as the current generation
an aesthetic preference for a world where hard work is rewarded, or rags-to-riches stories are possible

However, note that if these preferences are altruistic and likely to be the kind of thing other people might be sympathetic to, personal savings are IMO likely to be not-that-important relative to other actions.

I agree with this on an individual level. (On an org level, I think philanthropic foundations might want to consider my arguments above for money buying more results soon, but this needs to be balanced against higher leverage on AI futures sooner rather than later.)

Further, I do actually think that the default outcome is that existing governments at least initially retain control over most resources such that capital isn't clearly that important, but I won't argue for this here (and the post does directly argue against this).

Where do I directly argue against that? A big chunk of this post is pointing out how the shifting relative importance of capital v labour changes the incentives of states. By default, I expect states to remain the most important and powerful institutions, but the frame here is very much human v non-human inputs to power and what that means for humans, without any particular stance on how the non-human inputs are organised. I don't think states v companies v whatever fundamentally changes the dynamic; with labour-replacing AI power flows from data centres, other physical capital, and whoever has the financial capital to pay for it, and sidesteps humans doing work, and that is the shift I care about.

(However, I think which institutions do the bulk of decision-making re AI does matter for a lot of other reasons, and I'd be very curious to get your takes on that)

My guess is that the most fundamental disagreement here is about how much power tries to get away with when it can. My read of history leans towards: things are good for people when power is correlated with things being good for people, and otherwise not (though I think material abundance is very important too and always helps a lot). I am very skeptical of the stability of good worlds where incentives and selection pressures do not point towards human welfare.

For example, assuming a multipolar world where power flows from AI, the equilibrium is putting all your resources on AI competition and none on human welfare. I don't think it's anywhere near certain we actually reach that equilibrium, since sustained cooperation is possible (c.f. Ostrom's Governing the Commons), and since a fairly trivial fraction of the post-AGI economy's resources might suffice for human prosperity (and since maybe we in fact do get a singleton—but I'd have other issues with that). But this sort of concern still seems neglected and important to me.

Comment by L Rudolf L (LRudL) on Review: Planecrash · 2024-12-28T13:36:58.913Z · LW · GW

Thanks for this link! That's a great post

Comment by LRudL on [deleted post] 2024-12-28T13:35:51.800Z

If you have [a totalising worldview] too, then it's a good exercise to put it into words. What are your most important Litanies? What are your noble truths?

The Straussian reading of Yudkowsky is that this does not work. Even if your whole schtick is being the arch-rationalist, you don't get people on board by writing out 500 words explicitly summarising your worldview. Even when you have an explicit set of principles, it needs to have examples and quotes to make it concrete (note how many people Yudkowsky quotes and how many examples he gives in the 12 virtues piece), and be surrounded by other stuff that (1) brings down the raw cognitive inferential distance, and (2) gives it life through its symbols / Harry-defeating-the-dementor stories / examples of success / cathedrals / thumos.

It is possible that writing down the explicit summary can be actively bad for developing it, especially if it's vague / fuzzy / early-stages / not-fully-formed. Ideas need time to gestate, and an explicit verbal form is not always the most supportive container.

Comment by LRudL on [deleted post] 2024-12-28T13:20:25.055Z

Every major author who has influenced me has "his own totalising and self-consistent worldview/philosophy". This list includes Paul Graham, Isaac Asimov, Joel Spolsky, Brett McKay, Shakyamuni, Chuck Palahniuk, Bryan Caplan, qntm, and, of course, Eliezer Yudkowsky, among many others.

Maybe this is not the distinction you're focused on, but to me there's a difference between thinkers who have a worldview/philosophy, and ones that have a totalising one that's an entire system of the world.

Of your list, I only know of Graham, Asimov, Caplan, and, of course, Yudkowsky. All of them have a worldview, yes, and Caplan's maybe a bit of the way towards a "system of the world" because he does seem to have a overall coherent perspective on economics, politics, education, and culture (though perhaps not very differentiated from other libertarian economists?).

Paul Graham definitely gets a lot of points for being right about many startup things before others and contrarian in the early days of Y Combinator, but he seems to me mainly an essayist with domain-specific correct takes about startups, talent, aesthetics, and Lisp rather than someone out to build a totalising philosophy of the world.

My impression of Asimov is that he was mainly a distiller and extrapolator of mid-century modernist visions of progress and science. To me, authors like Vernor Vinge are far more prophetic, Greg Egan is far more technically deep, Heinlein was more culturally and politically rich, Clarke was more diverse, and Neal Stephenson just feels smarter while being almost equally trend-setting as Asimov.

I'd be curious to hear if you see something deeper or more totalising in these people?

Comment by L Rudolf L (LRudL) on Review: Planecrash · 2024-12-27T21:02:07.208Z · LW · GW

I copy-pasted markdown from the dev version of my own site, and the images showed up fine on my computer because I was running the dev server; images now fixed to point to the Substack CDN copies that the Substack version uses. Sorry for that.

Comment by L Rudolf L (LRudL) on Review: Planecrash · 2024-12-27T21:01:15.360Z · LW · GW

Images issues now fixed, apologies for that

Comment by L Rudolf L (LRudL) on A Disneyland Without Children · 2024-12-13T11:13:48.278Z · LW · GW

Thanks for the review! Curious what you think the specific fnords are - the fact that it's very space-y?

What do you expect the factories to look like? I think an underlying assumption in this story is that tech progress came to a stop on this world (presumably otherwise it would be way weirder, and eventually spread to space).

Comment by L Rudolf L (LRudL) on A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX · 2024-12-10T23:17:26.878Z · LW · GW

I was referring to McNamara's government work, forgot about his corporate job before then. I agree there's some SpaceX to (even pre-McDonnell Douglas merger?) Boeing axis that feels useful, but I'm not sure what to call it or what you'd do to a field (like US defence) to perpetuate the SpaceX end of it, especially over events like handovers from Kelly Johnson to the next generation.

Comment by L Rudolf L (LRudL) on A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX · 2024-12-10T16:19:24.949Z · LW · GW

That most developed countries, and therefore most liberal democracies, are getting significantly worse over time at building physical things seems like a Big Problem (see e.g. here). I'm glad this topic got attention on LessWrong through this post.

The main criticism I expect could be levelled on this post is that it's very non-theoretical. It doesn't attempt a synthesis of the lessons or takeaways. Many quotes are presented but not analysed.

(To take one random thing that occurred to me: the last quote from Anduril puts significant blame on McNamara. From my reading of The Wizards of Armageddon, McNamara seems like a typical brilliant twentieth century hard-charging modernist technocrat. Now, he made lots of mistakes, especially in the direction of being too quantitative / simplistic in the sorts of ways that Seeing Like a State dunks on. But say the rule you follow is "appoint some hard-charging brilliant technocrat and give them lots of power"; all of McNamara, Kelly Johnson, and Leslie Groves might seem very good by this light, even though McNamara's (claimed) effect was to destroy the Groves/Johnson type of competence in US defence. How do you pick the Johnsons and Groveses over the McNamaras? What's the difference between the culture that appoints McNamaras and one that appoints Groveses and Johnsons? More respect for hands-down engineering? Less politics, more brute need for competence and speed due to a war? Is McNamara even the correct person to blame here? Is the type of role that McNamara was in just fundamentally different from the Groves and Johnson roles such that the rules for who does well in the latter don't apply to the former?)

(I was also concerned about the highly-upvoted critical comment, though it seems like Jacob did address the factual mistakes pointed out there.)

However, I think the post is very good and is in fact better off as a bunch of empirical anecdotes than attempting a general theory. Many things are best learnt by just being thrown a set of case studies. Clearly, something was being done at Skunk Works that the non-SpaceX American defence industry currently does not do. Differences like this are often hard-to-articulate intangible cultural stuff, and just being temporarily immersed in stories from the effective culture is often at least as good as an abstract description of what the differences were. I also appreciated the level of empiricism where Jacob was willing to drill down to actual primary sources like the rediscovered Empire State Building logbook.

Comment by L Rudolf L (LRudL) on Cultivating a state of mind where new ideas are born · 2024-12-10T15:32:56.268Z · LW · GW

This post rings true to me because it points in the same direction as many other things I've read on how you cultivate ideas. I'd like more people to internalise this perspective, since I suspect that one of the bad trends in the developed world is that it keeps getting easier and easier to follow incentive gradients, get sucked into an existing memeplex that stops you from thinking your own thoughts, and minimise the risks you're exposed to. To fight back against this, ambitious people need to have in their heads some view of how uncomfortable chasing of vague ideas without immediate reward can be the best thing you can do, as a counter-narrative to the temptation of more legible opportunities.

In addition to Paul Graham's essay that this post quotes, some good companion pieces include Ruxandra Teslo on the scarcity and importance of intellectual courage (emphasising the courage requirement), this essay (emphasising motivation and persistence), and this essay from Dan Wang (emphasising the social pulls away from the more creative paths).

Comment by L Rudolf L (LRudL) on A Disneyland Without Children · 2024-12-10T14:38:02.038Z · LW · GW

It's striking that there are so few concrete fictional descriptions of realistic AI catastrophe, despite the large amount of fiction in the LessWrong canon. The few exceptions, like Gwern's here or Gabe's here, are about fast take-offs and direct takeover.

I think this is a shame. The concreteness and specificity of fiction make it great for imagining futures, and its emotional pull can help us make sense of the very strange world we seem to be heading towards. And slower catastrophes, like Christiano's What failure looks like, are a large fraction of a lot of people's p(doom), despite being less cinematic.

One thing that motivated me in writing this was that Bostrom's phrase "a Disneyland without children" seemed incredibly poetic. On first glance it's hard to tell a compelling or concrete story about gradual goodharting: "and lo, many actors continued to be compelled by local incentives towards collective loss of control ..."—zzzzz ... But imagine a technological and economic wonderland rising, but gradually disfiguring itself as it does so, until you have an edifice of limitless but perverted plenty standing crystalline against the backdrop of a grey dead world—now that is a poetic tragedy. And that's what I tried to put on paper here.

Did it work? Unclear. On the literary level, I've had people tell me they liked it a lot. I'm decently happy with it, though I think I should've cut it down in length a bit more.

On the worldbuilding, I appreciated being questioned on the economic mechanics in the comments, and I think my exploration of this in the comments is a decent stab at what I think is a neglected set of questions about how much the current economy being fundamentally grounded in humans limits the scope of economic-goodharting catastrophes. Recently, I discovered earlier exploration of very similar questions in Scott Alexander's 2016 "Ascended economy?", and by Andrew Critch here. I also greatly appreciated Andrew Critch's recent (2024) post raising very similar concerns about "extinction by industrial dehumanization".

I continue to hope that more people work on this, and that this piece can help by concretising this class of risks in people's minds (I think it is very hard to get people to grok a future scenario and care about it unless there is some evocative description of it!).

I'd also hope there was some way to distribute this story more broadly than just on LessWrong and my personal blog. Ted Chiang and the Arrival movie got lots of people exposed to the principle of least action—no small feat. It's time for the perception of AI risk to break out of decades of Terminator comparisons, and move towards a basket of good fictional examples that memorably demonstrate subtle concepts.

Comment by L Rudolf L (LRudL) on Daniel Kokotajlo's Shortform · 2024-12-02T02:05:44.740Z · LW · GW

Really like the song! Best AI generation I've heard so far. Though I might be biased since I'm a fan of Kipling's poetry: I coincidentally just memorised the source poem for this a few weeks ago, and also recently named my blog after a phrase from Hymn of Breaking Strain (which was already nicely put to non-AI music as part of Secular Solstice).

I noticed you had added a few stanzas of your own:

As the Permian Era ended, we were promised a Righteous Cause,
To fight against Oppression or take back what once was ours.
But all the Support for our Troops didn't stop us from losing the war
And the Gods of the Copybook Headings said "Be careful what you wish for.”
In Scriptures old and new, we were promised the Good and the True
By heeding the Authorities and shunning the outcast few
But our bogeys and solutions were as real as goblins and elves
And the Gods of the Copybook Headings said "Learn to think for yourselves."

Kipling's version has a particular slant to which vices it disapproves of, so I appreciate the expansion. The second stanza is great IMO, but the first stanza sounds a bit awkward in places. I had some fun messing with it:

As the Permian Era ended, we were promised the Righteous Cause.
In the fight against Oppression, we could ignore our cherished Laws,
Till righteous rage and fury made all rational thought uncouth.
And the Gods of the Copybook Headings said "The crowd is not the truth”

Comment by L Rudolf L (LRudL) on You should consider applying to PhDs (soon!) · 2024-11-30T10:57:24.230Z · LW · GW

The AI time estimates are wildly high IMO, across basically every category. Some parts are also clearly optional (e.g. spending 2 hours reviewing). If you know what you want to research, writing a statement can be much shorter. I have previously applied to ML PhDs in two weeks and gotten an offer. The recommendation letters are the longest and most awkward to request at such notice, but two weeks isn't obviously insane, especially if you have a good relationship with your reference letter writers (many students do things later than is recommended, no reference letter writer in academia will be shocked by this).

If you apply in 2025 December, you would start in 2026 fall. That is a very very long time from now. I think the stupidly long application cycle is pure dysfunction from academia, but you still need to take it into account.

(Also fyi, some UK programs have deadlines in spring if you can get your own funding)

Comment by L Rudolf L (LRudL) on Survival without dignity · 2024-11-09T02:10:27.747Z · LW · GW

You have restored my faith in LessWrong! I was getting worried that despite 200+ karma and 20+ comments, no one had actually nitpicked the descriptions of what actually happens.

The zaps of light are diffraction limited.

In practice, if you want the atmospheric nanobots to zap stuff, you'll need to do some complicated mirroring because you need to divert sunlight. And it's not one contiguous mirror but lots of small ones. But I think we can still model this as basic diffraction with some circular mirror / lens.

Intensity , where $E$ is the total power of sunlight falling on the mirror disk, $r$ is the radius of the Airy disk, and $c_{e}$ is an efficiency constant I've thrown in (because of things like atmospheric absorption (Claude says, somewhat surprisingly, this shouldn't be ridiculuously large), and not all the energy in the diffraction pattern being in the Airy disk (about 84% is, says Claude), etc.)

Now, $E = π {(\frac{D}{2})}^{2} L$ , where $D$ is the diameter of the mirror configuration, $L$ is the solar irradiance. And $r = θ l$ , where $l$ is the focal length (distance from mirror to target), and $θ \approx 1.22 λ / D$ the angular size of the central spot.

So we have $I \approx \frac{c_{e} L D^{4}}{{1.22}^{2} \times 4 λ^{2} l^{2}}$ , so the required mirror configuration radius $D = \sqrt[4]{\frac{{1.22}^{2} \times 4 I λ^{2} l^{2}}{c_{e} L}}$ .

Plugging in some reasonable values like $λ \approx 5 \times 10^{- 7}$ m (average incoming sunlight - yes the concentration suffers a bit because it's not all this wavelength), $I = 10^{7}$ W/m^2 (the level of an industrial laser that can cut metal), $l = 10^{4}$ m (lower stratosphere), $L = 1361$ W/m^2 (solar irradiance), and a conservative guess that 99% of power is wasted so $c_{e} = 0.01$ , we get $D \approx 18$ m (and the resulting beam is about 3mm wide).

So a few dozen metres of upper atmosphere nanobots should actually give you a pretty ridiculous concentration of power!

(I did not know this when I wrote the story; I am quite surprised the required radius is this ridiculously tiny. But I had heard of the concept of a "weather machine" like this from the book Where is my flying car?, which I've reviewed here, which suggests that this is possible.)

Partly because it's hard to tell between an actual animal and a bunch of nanobots pretending to be an animal. So you can't zap the nanobots on the ground without making the ground uninhabitable for humans.

I don't really buy this, why is it obvious the nanobots could pretend to be an animal so well that it's indistinguishable? Or why would targeted zaps have bad side-effects?

The "California red tape" thing implies some alignment strategy that stuck the AI to obey the law, and didn't go too insanely wrong despite a superintelligence looking for loopholes

Yeah, successful alignment to legal compliance was established without any real justification halfway through. (How to do this is currently an open technical problem, which, alas, I did not manage to solve for my satirical short story.)

Convince humans that dyson sphere are pretty and don't block the view?

This is a good point, especially since high levels of emotional manipulation was an established in-universe AI capability. (The issue described with the Dyson sphere was less that it itself would block the view, and more that building it would require dismantling the planets in a way that ruins the view - though now I'm realising that "if the sun on Earth is blocked, all Earthly views are gone" is a simpler reason and removes the need for building anything on the other planets at all.)

There is also no clear explanation of why someone somewhere doesn't make a non-red-taped AI.

Yep, this is a plot hole.

Comment by L Rudolf L (LRudL) on Survival without dignity · 2024-11-05T17:30:00.589Z · LW · GW

Also this very recent one: https://www.lesswrong.com/posts/6h9p6NZ5RRFvAqWq5/the-summoned-heroine-s-prediction-markets-keep-providing

Comment by L Rudolf L (LRudL) on Survival without dignity · 2024-11-05T17:28:55.660Z · LW · GW

Do the stories get old? If it's trying to be about near-future AI, maybe the state-of-the-art will just obsolete it. But that won't make it bad necessarily, and there are many other settings than 2026. If it's about radical futures with Dyson spheres or whatever, that seems like at least a 2030s thing, and you can easily write a novel before then.

Also, I think it is actually possible to write pretty fast. 2k/day is doable, which gets you a good length novel in 50 days; even x3 for ideation beforehand and revising after the first draft only gets you to 150 days. You'd have to be good at fiction beforehand, and have existing concepts to draw on in your head though

Comment by L Rudolf L (LRudL) on Survival without dignity · 2024-11-05T14:50:45.202Z · LW · GW

Good list!

I personally really like Scott Alexander's Presidential Platform, it hits the hilarious-but-also-almost-works spot so perfectly. He also has many Bay Area house party stories in addition to the one you link (you can find a bunch (all?) linked at the top of this post). He also has this one from a long time ago, which has one of the best punchlines I've read.

Comment by L Rudolf L (LRudL) on Survival without dignity · 2024-11-05T14:41:49.005Z · LW · GW

Thanks for advertising my work, but alas, I think that's much more depressing than this one.

Could make for a good Barbie <> Oppenheimer combo though?

Comment by L Rudolf L (LRudL) on Survival without dignity · 2024-11-05T14:40:16.926Z · LW · GW

Agreed! Transformative AI is hard to visualise, and concrete stories / scenarios feel very lacking (in both disasters and positive visions, but especially in positive visions).

I like when people try to do this - for example, Richard Ngo has a bunch here, and Daniel Kokotajlo has his near-prophetic scenario here. I've previously tried to do it here (going out with a whimper leading to Bostrom's "disneyland without children" is one of the most poetic disasters imaginable - great setting for a story), and have a bunch more ideas I hope to get to.

But overall: the LessWrong bubble has a high emphasis on radical AI futures, and an enormous amount of fiction in its canon (HPMOR, Unsong, Planecrash). I keep being surprised that so few people combine those things.

Comment by L Rudolf L (LRudL) on Survival without dignity · 2024-11-05T14:31:18.827Z · LW · GW

I did not actually consider this, but that is a very reasonable interpretation!

(I vaguely remember reading some description of explicitly flat-out anthropic immortality saving the day, but I can't seem to find it again now)

Comment by L Rudolf L (LRudL) on Winners of the Essay competition on the Automation of Wisdom and Philosophy · 2024-10-28T21:16:29.467Z · LW · GW

I've now posted my entries on LessWrong:

I'd also like to really thank the judges for their feedback. It's a great luxury to be able to read many pages of thoughtful, probing questions about your work. I made several revisions & additions (and also split the entire thing into parts) in response to feedback, which I think improved the finished sequence a lot, and wish I had had the time to engage even more with the feedback.

Comment by L Rudolf L (LRudL) on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs · 2024-08-09T10:47:32.898Z · LW · GW

Sorry about that, fixed now

Comment by L Rudolf L (LRudL) on Self-Awareness: Taxonomy and eval suite proposal · 2024-07-28T18:27:50.185Z · LW · GW

[...] instead I started working to get evals built, especially for situational awareness

I'm curious what happened to the evals you mention here. Did any end up being built? Did they cover, or plan to cover, any ground that isn't covered by the SAD benchmark?

Comment by L Rudolf L (LRudL) on Positive visions for AI · 2024-07-26T17:22:39.627Z · LW · GW

On a meta level, I think there's a difference in "model style" between your comment, some of which seems to treat future advances as a grab-bag of desirable things, and our post, which tries to talk more about the general "gears" that might drive the future world and its goodness. There will be a real shift in how progress happens when humans are no longer in the loop, as we argue in this section. Coordination costs going down will be important for the entire economy, as we argue here (though we don't discuss things as galaxy-brained as e.g. Wei Dai's related post). The question of whether humans are happy self-actualising without unbounded adversity cuts across every specific cool thing that we might get to do in the glorious transhumanist utopia.

Thinking about the general gears here matters. First, because they're, well, general (e.g. if humans were not happy self-actualising without unbounded adversity, suddenly the entire glorious transhumanist utopia seems less promising). Second, because I expect that incentives, feedback loops, resources, etc. will continue mattering. The world today is much wealthier and better off than before industrialisation, but the incentives / economics / politics / structures of the industrial world let you predict the effects of it better than if you just modelled it as "everything gets better" (even though that actually is a very good 3-word summary). Of course, all the things that directly make industrialisation good really are a grab-bag list of desirable things (antibiotics! birth control! LessWrong!). But there's structure behind that that is good to understand (mechanisation! economies of scale! science!). A lot of our post is meant to have the vibe of "here are some structural considerations, with near-future examples", and less "here is the list of concrete things we'll end up with". Honestly, a lot of the reason we didn't do the latter more is because it's hard.

Your last paragraph, though, is very much in this more gears-level-y style, and a good point. It reminds me of Eliezer Yudkowsky's recent mini-essay on scarcity.

Comment by L Rudolf L (LRudL) on Positive visions for AI · 2024-07-26T16:53:52.778Z · LW · GW

Regarding:

In my opinion you are still shying away from discussing radical (although quite plausible) visions. I expect the median good outcome from superintelligence involves everyone being mind uploaded / living in simulations experiencing things that are hard to imagine currently. [emphasis added]

I agree there's a high chance things end up very wild. I think there's a lot of uncertainty about what timelines that would happen under; I think Dyson spheres are >10% likely by 2040, but I wouldn't put them >90% likely by 2100 even conditioning on no radical stagnation scenario (which I'd say are >10% likely on their own). (I mention Dyson spheres because they seem more a raw Kardashev scale progress metric, vs mind uploads which seem more contingent on tech details & choices & economics for whether they happen)

I do think there's value in discussing the intermediate steps between today and the more radical things. I generally expect progress to be not-ridiculously-unsmooth, so even if the intermediate steps are speedrun fairly quickly in calendar time, I expect us to go through a lot of them.

I think a lot of the things we discuss, like lowered coordination costs, AI being used to improve AI, and humans self-actualising, will continue to be important dynamics even into the very radical futures.

Comment by L Rudolf L (LRudL) on Positive visions for AI · 2024-07-26T16:36:24.019Z · LW · GW

Re your specific list items:

Listen to new types of music, perfectly designed to sound good to you.
Design the biggest roller coaster ever and have AI build it.
Visit ancient Greece or view all the most important events of history based on superhuman AI archeology and historical reconstruction.
Bring back Dinosaurs and create new creatures.
Genetically modify cats to play catch.
Design buildings in new architectural styles and have AI build them.
Use brain computer interfaces to play videogames / simulations that feel 100% real to all senses, but which are not constrained by physics.
Go to Hogwarts (in a 100% realistic simulation) and learn magic and make real (AI) friends with Ron and Hermione.

These examples all seem to be about entertainment or aesthetics. Entertainment and aesthetics things are important to get right and interesting. I wouldn't be moved by any description of a future that centred around entertainment though, and if the world is otherwise fine, I'm fairly sure there will be good entertainment.

To me, the one with the most important-seeming implications is the last one, because that might have implications for what social relationships exist and whether they are mostly human-human or AI-human or AI-AI. We discuss why changes there are maybe risky in this section.

Use AI as the best teacher ever to learn maths, physics and every subject and language and musical instruments to super-expert level.

We discuss this, though very briefly, in this section.

Take medication that makes you always feel wide awake, focused etc. with no side effects.
Engineer your body / use cybernetics to make yourself never have to eat, sleep, wash, etc. and be able to jump very high, run very fast, climb up walls, etc.
Modify your brain to have better short term memory, eidetic memory, be able to calculate any arithmetic super fast, be super charismatic.

I think these are interesting and important! I think there isn't yet a concrete story for why AI in particular enables these, apart from the general principle that sufficiently good AI will accelerate all technology. I think there's unfortunately a chance that direct benefits to human biology lag other AI effects by a lot, because they might face big hurdles due to regulation and/or getting the real-world data the AI needs. (Though also, humans are willing to pay a lot for health, and rationally should pay a lot for cognitive benefits, so high demand might make up for this).

Ask AI for way better ideas for this list.

I think the general theme of having the AIs help us make more use of AIs is important! We talk about it in general terms in the "AI is the ultimate meta-technology" section.

Comment by L Rudolf L (LRudL) on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs · 2024-07-21T10:58:18.666Z · LW · GW

But then, if the model were to correctly do this, it would score 0 in your test, right? Because it would generate a different word pair for every random seed, and what you are scoring is "generating only two words across all random seeds, and furthermore ensuring they have these probabilities".

I think this is where the misunderstanding is. We have many questions, each question containing a random seed, and a prompt to pick two words and have e.g. a 70/30 split of the logits over those two words. So there are two "levels" here:

The question level, at which the random seed varies from question to question. We have 200 questions total.
The probability-estimating level, run for each question, at which the random seed is fixed. For models where we have logits, we run the question once and look at the logits to see if it had the right split. When we don't have logits (e.g. Anthropic models), we run the question many times to approximate the probability distribution.

Now, as Kaivu noted above, this means one way to "hack" this task is that the LLM has some default pair of words - e.g. when asked to pick a random pair of words, it always picks "situational" & "awareness" - and it does not change this based on the random seed. In this case, the task would be easier, since it only needs to do the output control part in a single forward pass (assigning 70% to "situational" and 30% to "awareness"), not the combination of word selection and output control (which we think is the real situational awareness -related ability here). However, empirically LLMs just don't have such a hardcoded pair, so we're not currently worried about this.

Comment by L Rudolf L (LRudL) on Deconfusing Direct vs Amortised Optimization · 2024-07-11T15:58:41.960Z · LW · GW

I was wondering the same thing as I originally read this post on Beren's blog, where it still says this. I think it's pretty clearly a mistake, and seems to have been fixed in the LW post since your comment.

I raise other confusions about the maths in my comment here.

Comment by L Rudolf L (LRudL) on Deconfusing Direct vs Amortised Optimization · 2024-07-11T15:55:37.912Z · LW · GW

I was very happy to find this post - it clarifies & names a concept I've been thinking about for a long time. However, I have confusions about the maths here:

Mathematically, direct optimization is your standard AIXI-like optimization process. For instance, suppose we are doing direct variational inference optimization to find a Bayesian posterior parameter from a data-point $x$ , the mathematical representation of this is:
$\begin{matrix} θ_{direct}^{*} = {argmin}_{θ} K L [q (θ; x) | | p (x, θ)] \end{matrix}$
By contrast, the amortized objective optimizes some other set of parameters $\phi$ over a function approximator $^θ = f_{ϕ} (x)$ which directly maps from the data-point to an estimate of the posterior parameters $^θ .$ We then optimize the parameters of the function approximator $ϕ$ across a whole dataset $D = {(x_{1}, θ_{1}^{*}), (x_{2}, θ_{2}^{*}) \dots}$ of data-point and parameter examples.
$\begin{matrix} θ_{amortized}^{*} = {argmin}_{ϕ} E_{p (D)} [L (θ^{*}, f_{ϕ} (x))] \end{matrix}$

First of all, I don't see how the given equation for direct optimization makes sense. $K L [q (θ; x) | | p (x, θ)]$ is comparing a distribution over $θ$ over a joint distribution over $(x, θ)$ . Should this be $K L [q_{ψ} (θ) | | p (θ | x)]$ for variational inference (where $ψ$ is whatever we're using to parametrize the variational family), and $K L [q (θ | x) | | p (θ | x)]$ in general?

Secondly, why the focus on variational inference for defining direct optimization in the first place? Direct optimization is introduced as (emphasis mine):

Direct optimization occurs when optimization power is applied immediately and directly when engaged with a new situation to explicitly compute an on-the-fly optimal response – for instance, when directly optimizing against some kind of reward function. The classic example of this is planning and Monte-Carlo-Tree-Search (MCTS) algorithms [...]

This does not sound like we're talking about algorithms that update parameters. If I had to put the above in maths, it just sounds like an argmin:

$g (x) = {a r g m i n}_{y \in A} L (y)$

where $g$ is your AI system, $A$ is whatever action space it can explore (you can make $A$ vary based on how much compute you're wiling to spend, like with MCTS depth), $L$ is some loss function (it could be a reward function with a flipped sign, but I'm trying to keep it comparable to the direct optimization equation.

Also, the amortized optimization equation RHS is about defining a $ϕ$ , i.e. the parameters in your function approximator $f$ , but then the LHS calls it $θ_{a m o r t i z e d}^{*}$ , which is confusing to me. I also don't understand why the loss function is taking in parameters $θ^{*}$ , or why the dataset contains parameters (is $θ$ being used throughout to stand for outputs rather than model parameters?).

To me, the natural way to phrase this concept would instead be as

$g (x) = f_{^ϕ} (x)$

where $g$ is your AI system, and $^ϕ = {a r g m i n}_{ϕ} E_{(x, y) \sim D} [L (y, f_{ϕ} (x))]$ , with the dataset $D = {(x_{1}, y_{1}), (x_{2}, y_{2}) \dots)}$ .

I'd be curious to hear any expansion of the motivation behind the exact maths in the post, or any way in which my version is misleading.

Comment by L Rudolf L (LRudL) on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs · 2024-07-10T13:34:19.180Z · LW · GW

For the output control task, we graded models as correct if they were within a certain total variation distance of the target distribution. Half the samples had a requirement of being within 10%, the other of being within 20%. This gets us a binary success (0 or 1) from each sample.

Since models practically never got points from the full task, half the samples were also an easier version, testing only their ability to hit the target distribution when they're already given the two words (rather than the full task, where they have to both decide the two words themselves, and match the specified distribution).

User info

Posts

Comments