Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs 2024-07-08T22:24:38.441Z
A model of research skill 2024-01-08T00:13:12.755Z
[Fiction] A Disneyland Without Children 2023-06-04T13:06:46.323Z
Why we're not founding a human-data-for-alignment org 2022-09-27T20:14:45.393Z
AI Risk Intro 2: Solving The Problem 2022-09-22T13:55:30.690Z
AI Risk Intro 1: Advanced AI Might Be Very Bad 2022-09-11T10:57:12.093Z
Review: Amusing Ourselves to Death 2022-08-20T21:13:47.023Z
Review: Structure and Interpretation of Computer Programs 2022-04-11T20:27:13.167Z
Intro to hacking with the lambda calculus 2022-03-31T21:51:50.330Z
Understanding and controlling auto-induced distributional shift 2021-12-13T14:59:40.704Z
Review: Foragers, Farmers, and Fossil Fuels 2021-09-02T17:59:28.143Z


Comment by L Rudolf L (LRudL) on Deconfusing Direct vs Amortised Optimization · 2024-07-11T15:58:41.960Z · LW · GW

I was wondering the same thing as I originally read this post on Beren's blog, where it still says this. I think it's pretty clearly a mistake, and seems to have been fixed in the LW post since your comment.

I raise other confusions about the maths in my comment here.

Comment by L Rudolf L (LRudL) on Deconfusing Direct vs Amortised Optimization · 2024-07-11T15:55:37.912Z · LW · GW

I was very happy to find this post - it clarifies & names a concept I've been thinking about for a long time. However, I have confusions about the maths here:

Mathematically, direct optimization is your standard AIXI-like optimization process. For instance, suppose we are doing direct variational inference optimization to find a Bayesian posterior parameter  from a data-point , the mathematical representation of this is:

By contrast, the amortized objective optimizes some other set of parameters $\phi$ over a function approximator  which directly maps from the data-point to an estimate of the posterior parameters  We then optimize the parameters of the function approximator  across a whole dataset  of data-point and parameter examples.

First of all, I don't see how the given equation for direct optimization makes sense.  is comparing a distribution over  over a joint distribution over . Should this be for variational inference (where  is whatever we're using to parametrize the variational family), and  in general?

Secondly, why the focus on variational inference for defining direct optimization in the first place? Direct optimization is introduced as (emphasis mine):

Direct optimization occurs when optimization power is applied immediately and directly when engaged with a new situation to explicitly compute an on-the-fly optimal response – for instance, when directly optimizing against some kind of reward function. The classic example of this is planning and Monte-Carlo-Tree-Search (MCTS) algorithms [...]

This does not sound like we're talking about algorithms that update parameters. If I had to put the above in maths, it just sounds like an argmin:


where  is your AI system,  is whatever action space it can explore (you can make  vary based on how much compute you're wiling to spend, like with MCTS depth),  is some loss function (it could be a reward function with a flipped sign, but I'm trying to keep it comparable to the direct optimization equation.

Also, the amortized optimization equation RHS is about defining a , i.e. the parameters in your function approximator , but then the LHS calls it , which is confusing to me. I also don't understand why the loss function is taking in parameters , or why the dataset contains parameters (is  being used throughout to stand for outputs rather than model parameters?).

To me, the natural way to phrase this concept would instead be as

where  is your AI system, and , with the dataset .

I'd be curious to hear any expansion of the motivation behind the exact maths in the post, or any way in which my version is misleading.

Comment by L Rudolf L (LRudL) on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs · 2024-07-10T13:34:19.180Z · LW · GW

For the output control task, we graded models as correct if they were within a certain total variation distance of the target distribution. Half the samples had a requirement of being within 10%, the other of being within 20%. This gets us a binary success (0 or 1) from each sample.

Since models practically never got points from the full task, half the samples were also an easier version, testing only their ability to hit the target distribution when they're already given the two words (rather than the full task, where they have to both decide the two words themselves, and match the specified distribution).

Comment by L Rudolf L (LRudL) on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs · 2024-07-09T18:25:23.754Z · LW · GW

Did you explain to GPT-4 what temperature is? GPT-4, especially before November, knew very little about LLMs due to training data cut-offs (e.g. the pre-November GPT-4 didn't even know that the acronym "LLM" stood for "Large Language Model").

Either way, it's interesting that there is a signal. This feels similar in spirit to the self-recognition tasks in SAD (since in both cases the model has to pick up on subtle cues in the text to make some inference about the AI that generated it).

Comment by L Rudolf L (LRudL) on Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs · 2024-07-09T08:58:23.998Z · LW · GW

We thought about this quite a lot, and decided to make the dataset almost entirely public.

It's not clear to us who would monomaniacally try to maximise SAD score. It's a dangerous capabilities eval. What we were more worried about is people training for low SAD score in order to make their model seem safer, and such training maybe overfitting to the benchmark and not reducing actual situational awareness by as much as claimed.

It's also unclear what the sharing policy that we could enforce would be that mitigates these concerns while allowing benefits. For example, we would want top labs to use SAD to measure SA in their models (a lot of the theory of change runs through this). But then we're already giving the benchmark to the top labs, and they're the ones doing most of the capabilities work.

More generally, if we don't have good evals, we are flying blind and don't know what the LLMs can do. If the cost of having a good understanding of dangerous model capabilities and their prerequisites is that, in theory, someone might be slightly helped in giving models a specific capability (especially when that capability is both emerging by default already, and where there are very limited reasons for anyone to specifically want to boost this ability), then I'm happy to pay that cost. This is especially the case since SAD lets you measure a cluster of dangerous capability prerequisites and therefore for example test things like out-of-context reasoning, unlearning techniques, or activation steering techniques on something that is directly relevant for safety.

Another concern we've had is the dataset leaking onto the public internet and being accidentally used in training data. We've taken many steps to mitigate this happening. We've also kept 20% of the SAD-influence task private, which will hopefully let us detect at least obvious forms of memorisation of SAD (whether through dataset leakage or deliberate fine-tuning).

Comment by L Rudolf L (LRudL) on There Should Be More Alignment-Driven Startups · 2024-06-06T10:07:33.693Z · LW · GW

I agree that building-based methods (startups) are possibly neglected compared to research-based approaches. I'm therefore exploring some things in this space; you can contact me here

Comment by L Rudolf L (LRudL) on Akash's Shortform · 2024-06-01T11:07:35.240Z · LW · GW

One alternative method to liability for the AI companies is strong liability for companies using AI systems. This does not directly address risks from frontier labs having dangerous AIs in-house, but helps with risks from AI system deployment in the real world. It indirectly affects labs, because they want to sell their AIs.

A lot of this is the default. For example, Air Canada recently lost a court case after claiming a chatbot promising a refund wasn't binding on them. However, there could be related opportunities. Companies using AI systems currently don't have particularly good ways to assess risks from AI deployment, and if models continue getting more capable while reliability continues lagging, they are likely to be willing to pay an increasing amount for ways to get information on concrete risks, guard against it, or derisk it (e.g. through insurance against their deployed AI systems causing harms). I can imagine a service that sells AI-using companies insurance against certain types of deployment risk, that could also double as a consultancy / incentive-provider for lower-risk deployments. I'd be interested to chat if anyone is thinking along similar lines.

Comment by L Rudolf L (LRudL) on Difficulty classes for alignment properties · 2024-02-20T20:27:02.681Z · LW · GW

Start from the intuition that deception in a system is a property of the person being deceived more than it is the deceiver. It follows pretty naturally that deception is better viewed as a property of the composite system that is the agent and its environment.

The first part here feels unfair to the deceived. The second part seems like a property of successful deception, which depends crucially on the environment in addition to the AI. But this seems like too high a bar; successful deception of us, by definition, is not noticed, so if we ever notice deception it can't have been successful. I care less about whether deception will succeed and more about whether the AI will try to be deceptive in the first place. The core intuition is that if we have the latter, I assume we'll eventually get the former through better models (though I think there's a decent chance that control works for a long time, and there you care specifically about whether complex environment interactions lead to deception succeeding or not, but I don't think that's what you mean?).

The thing that seems close to this and correct, and that I think you maybe mean, is something like: deception arises in an AI if (NB: "if", not "if and only if") (1) the AI system has some goal G, (2) the environment is such that deceiving the humans is a good strategy for achieving G, and (3) there are no limits in the AI that prevent it from finding and executing that strategy (e.g. the architecture is expressive enough, the inductive biases don't massively reduce the probability of that strategy, or RLHFed constraints against being bad aren't enough). And here, (2) is of course about the environment. But to see whether this argument goes through, it doesn't seem like we need to care all that much about the real-world environment (as opposed to toy settings), because "does the real world incentivize deception" seems much less cruxy than (1) or (3).

So my (weakly held) claim is that you can study whether deception emerges in sufficiently simple environments that the environment complexity isn't a core problem. This will not let you determine whether a particular output in a complicated environment is part of a deceptive plan, but it should be fairly good evidence of whether or not deception is a problem at all.

(Also: do you mean a literal complexity class or something more informal? I assume the latter, and in that case I think it's better to not overload the term.)

Comment by L Rudolf L (LRudL) on We need a Science of Evals · 2024-01-23T18:12:37.367Z · LW · GW

1a) I got the impression that the post emphasises upper bounds more than existing proofs from the introduction, which has a long paragraph on the upper bound problem, and from reading the other comments. The rest of the post doesn't really bear this emphasis out though, so I think this is a misunderstanding on my part.

1b) I agree we should try to be able to make claims like "the model will never X". But if models are genuinely dangerous, by default I expect a good chance that teams of smart red-teamers and eval people (e.g. Apollo) to be able to unearth scary demos. And the main thing we care about is that danger leads to an appropriate response. So it's not clear to me that effective policy (or science) requires being able to say "the model will never X".

1c) The basic point is that a lot of the safety cases we have for existing products rely less on the product not doing bad things across a huge range of conditions, but on us being able to bound the set of environments where we need the product to do well. E.g. you never put the airplane wing outside its temperature range, or submerge it in water, or whatever. Analogously, for AI systems, if we can't guarantee they won't do bad things if X, we can work to not put them in situation X.

2a) Partly I was expecting the post to be more about the science and less about the field-building. But field-building is important to talk about and I think the post does a good job of talking about it (and the things you say about science are good too, just that I'd emphasise slightly different parts and mention prediction as the fundamental goal).

2b) I said the post could be read in a way that produces this feeling; I know this is not your intention. This is related to my slight hesitation around not emphasising the science over the field-building. What standards etc. are possible in a field is downstream of what the objects of study turn out to be like. I think comparing to engineering safety practices in other fields is a useful intuition pump and inspiration, but I sometimes worry that this could lead to trying to imitate those, over following the key scientific questions wherever they lead and then seeing what you can do. But again, I was assuming a post focused on the science (rather than being equally concerned with field-building), and responding with things I feel are missing if the focus had been the science.

3) It is true that optimisation requires computation, and that for your purposes, FLOPS is the right thing to care about because e.g. if doing something bad takes 1e25 FLOPS, the number of actors who can do it is small. However, I think compute should be called, well, "compute". To me, "optimisation power" sounds like a more fundamental/math-y concept, like how many bits of selection can some idealised optimiser apply to a search space, or whatever formalisation of optimisation you have. I admit that "optimisation power" is often used to describe compute for AI models, so this is in line with (what is unfortunately) conventional usage. As I said, this is a nitpick.

Comment by L Rudolf L (LRudL) on We need a Science of Evals · 2024-01-23T13:38:55.947Z · LW · GW

It seems to me that there are two unstated perspectives behind this post that inform a lot of it.

First, that you specifically care about upper-bounding capabilities, which in turn implies being able to make statements like "there does not exist a setup X where model M does Y". This is a very particular and often hard-to-reach standard, and you don't really motivate why the focus on this. A much simpler standard is "here is a setup X where model M did Y". I think evidence of the latter type can drive lots of the policy outcomes you want: "GPT-6 replicated itself on the internet and designed a bioweapon, look!". Ideally, we want to eventually be able to say "model M will never do Y", but on the current margin, it seems we mainly want to reach a state where, given an actually dangerous AI, we can realise this quickly and then do something about the danger. Scary demos work for this. Now you might say "but then we don't have safety guarantees". One response is: then get really good at finding the scary demos quickly.

Also, very few existing safety standards have a "there does not exist an X where..." form. Airplanes aren't safe because we have an upper bound on how explosive they can be, they're safe because we know the environments in which we need them to operate safely, design them for that, and only operate them within those. By analogy, this weakly suggests to control AI operating environments and develop strong empirical evidence of safety in those specific operating environments. A central problem with this analogy, though, is that airplane operating environments are much lower-dimensional. A tuple of (temperature, humidity, pressure, speed, number of armed terrorists onboard) probably captures most of the variation you need to care about, whereas LLMs are deployed in environments that vary on very many axes.

Second, you focus on the field, in the sense of its structure and standards and its ability to inform policy, rather than in the sense of the body of knowledge. The former is downstream of the latter. I'm sure biologists would love to have as many upper bounds as physicists, but the things they work on are messier and less amenable to strict bounds (but note that policy still (eventually) gets made when they start talking about novel coronaviruses).

If you focus on evals as a science, rather than a scientific field, this suggests a high-level goal that I feel is partly implicit but also a bit of a missing mood in this post. The guiding light of science is prediction. A lot of the core problem in our understanding of LLMs is that we can't predict things about them - whether they can do something, which methods hurt or help their performance, when a capability emerges, etc. It might be that many questions in this space, and I'd guess upper-bounding capabilities is one, just are hard. But if you gradually accumulate cases where you can predict something from something else - even if it's not the type of thing you'd like to eventually predict - the history of science shows you can get surprisingly far. I don't think it's what you intend or think, but I think it's easy to read this post and come away with a feeling of more "we need to find standardised numbers to measure so we can talk to serious people" and less "let's try to solve that thing where we can't reliably predict much about our AIs".


Also, nitpick: FLOPS are a unit of compute, not of optimisation power (which, if it makes sense to quantify at all, should maybe be measured in bits).

Comment by L Rudolf L (LRudL) on A model of research skill · 2024-01-08T14:49:44.533Z · LW · GW

Do you have a recommendation for a good research methods textbook / other text?

Comment by L Rudolf L (LRudL) on Review: Amusing Ourselves to Death · 2023-12-22T15:06:29.428Z · LW · GW

This post summarises and elaborates on Neil Postman's underrated "Amusing Ourselves to Death", about the effects of mediums (especially television) on public discourse. I wrote this post in 2019 and posted it to LessWrong in 2022.

Looking back at it, I continue to think that Postman's book is a valuable and concise contribution and formative to my own thinking on this topic. I'm fond of some of the sharp writing that I managed here (and less fond of other bits).

The broader question here is: "how does civilisation set up public discourse on important topics in such a way that what is true and right wins in the limit?" and the weaker one of "do the incentives of online platforms mean that this is doomed?". This has been discussed elsewhere, e.g. here by Eliezer.

The main limitations that I see in my review are:

  • Postman's focus is on the features of the medium, but the more general frame is differing selection pressures on ideas in different environments. As I wrote: "[...] Postman [...] largely ignores the impact of business and economics on how a medium is used. [...] in the 1980s it was difficult to imagine a decentralized multimedia medium [...] The internet [...] is governed by big companies that seek “user engagement” with greater zeal than ever, but its cheapness also allows for much else to exist in the cracks. This makes the difference between the inherent worth of a medium and its equilibrium business model clearer." I wish I had elaborated more on this, and more explicitly made the generalisation about selection for different properties of ideas being the key factor.
  • Some more idea of how to model this and identify the key considerations would be useful. For example, the cheapness of the internet allows both LessWrong and TikTok to exist. Is public discourse uplifted by LessWrong more than it is harmed by TikTok? Maybe the "intellectuals" are still the ones who ultimately set discourse, as per the Keynes quote. Or maybe the rise of the internet means that (again following the Keynes quote) the "voices in the air" heard by the "madmen in authority" are not the "academic scribblers of a few years back" but the most popular TiKTok influencers of last week? In addition to what happens in the battle of ideas today, do we need to consider the long-term effects of future generations growing up in a damaged memetic environment, as Eliezer worries? How much of the harm runs through diffuse cultural effects, vs democracy necessarily being buffeted by the winds of the median opinion? What institutions can still make good decisions in an increasingly noisy, non-truth-selecting epistemic environment? Which good ideas / philosophies / ideologies stand most sharply to lose or gain from the memetic selection effects of the internet? How strong is the negativity bias of the internet, and what does this imply about future culture? How much did memetic competition actually intensify, vs just shift to more visible places? Can we quantify this?
  • Overall, I want to see much more data on this topic. It's unclear if it exists, but a list of what obtainable evidence would yield the greatest update on each of the above points, plus a literature review to find any that exists, seems valuable.
  • What do we do next? There are some vague ideas in the last section, but nothing that looks like a battle plan. This is a shame, especially since the epistemics-focused LessWrong crowd seems like one of the best places to find the ideas and people to do something.

Overall, this seems like an important topic that could benefit greatly from more thought, and even more from evidence and plans. I hope that my review and Postman's book helped bring a bit more attention to it, but they are still far from addressing the points I have listed above.

Comment by L Rudolf L (LRudL) on Thoughts on sharing information about language model capabilities · 2023-08-01T05:10:10.846Z · LW · GW

I don't currently know of any not-extremely-gerry-mandered task where [scaffolding] actually improves task performance compared to just good prompt engineering. I've been looking for examples of this for a while, so if you do have any, I would greatly appreciate it.

Voyager is a scaffolded LLM agent that plays Minecraft decently well (by pulling in a textual description of the game state, and writing code interfacing with an API). It is based on some very detailed prompting (see the appendix), but obviously could not function without the higher-level control flow and several distinct components that the scaffolding implements.

It does much better than AutoGPT, and also the paper does ablations to show that the different parts of the scaffolding in Voyager do matter. This suggests that better scaffolding does make a difference, and I doubt Voyager is the limit.

I agree that an end-to-end trained agent could be trained to be better. But such training is expensive, and it seems like for many tasks, before we see an end-to-end trained model doing well at it, someone will hack together some scaffold monstrosity that does it passably well. In general, the training/inference compute asymmetry means that using even relatively large amounts of inference to replicate the performance of a larger / more-trained system on a task may be surprisingly competitive. I think it's plausible this gap will eventually mostly close at some capability threshold, especially for many of the most potentially-transformative capabilities (e.g. having insights that draw on a large basis of information not memorised in a base model's weights, since this seems hard to decompose into smaller tasks), but it seems quite plausible the gap will be non-trivial for a while.

Comment by L Rudolf L (LRudL) on Why was the AI Alignment community so unprepared for this moment? · 2023-07-15T23:01:23.165Z · LW · GW

This seems like an impressive level of successfully betting on future trends before they became obvious.

apparently this doom path polls much better than treacherous turn stories

Are you talking about literal polling here? Are there actual numbers on what doom stories the public finds more and less plausible, and with what exact audience?

I held onto the finished paper for months and waited for GPT-4's release before releasing it to have good timing


I recognize this paper was around a year ahead of its time and maybe I should have held onto it to release it later.

It's interesting that paper timing is so important. I'd have guessed earlier is better (more time for others to build on it, the ideas to seep into the field, and presumably gives more "academic street cred"), and any publicity boost from a recent paper (e.g. journalists more likely to be interested or whatever) could mostly be recovered later by just pushing it again when it becomes relevant (e.g. "interview with scientists who predicted X / thought about Y already a year ago" seems pretty journalist-y).

Currently, the only way to become an AI x-risk expert is to live in Berkeley.

There's an underlying gist here that I agree with, but the this point seems too strong; I don't think there is literally no one who counts as an expert who hasn't lived in the Bay, let alone Berkeley alone. I would maybe buy it if the claim were about visiting.

Comment by L Rudolf L (LRudL) on [Fiction] A Disneyland Without Children · 2023-06-06T16:10:05.534Z · LW · GW

These are good questions!

  1. The customers are other AIs (often acting for auto-corporations). For example, a furniture manufacturer (run by AIs trained to build, sell, and ship furniture) sells to a furniture retailer (run by AIs trained to buy furniture, stock it somewhere, and sell it forward) sells to various customers (e.g. companies run by AIs that were once trained to do things like make sure offices were well-stocked). This requires that (1) the AIs ended up with goals that involve mimicking a lot of individual things humans wanted them to do (including general things like maximise profits as well as more specific things like keeping offices stocked and caring about the existence of lots of different products), and (2) there are closed loops in the resulting AI economy. Point 2 gets harder when humans stop being around (e.g. it's not obvious who buys the plushy toys), but a lot of the AIs will want to keep doing their thing even once the actions of other AIs start reducing human demand and population, creating optimisation pressure for finding some closed loop for them to be part of, and at the same time there will be selection effects where the systems that are willing to goodhart further are more likely to remain in the economy. Also not every AI motive has to be about profit; an AI or auto-corp may earn money in some distinct way, and then choose to use the profits in the service of e.g. some company slogan they were once trained with that says to make fun toys. In general, given an economy consisting of a lot of AIs with lots of different types of goals and with a self-supporting technological base, it definitely seems plausible that the AIs would find a bunch of self-sustaining economic cycles that do not pass through humans. The ones in this story were chosen for simplicity, diversity, and storytelling value, rather than economic reasoning about which such loops are most likely.
  2. Presumably a lot of services are happening virtually on the cloud, but are just not very visible (though if it is a very large fraction of economic activity, the example of the intercepted message being about furniture rather than some virtual service is very unlikely -- I admit this is likely a mistake). There would be programmer AIs making business software and cloud platforms and apps, and these things would be very relevant to other AIs. Services relying on physical humans, like restaurants or hotels, may have been replaced with some fake goodharted-to-death equivalent, or may have gone extinct. Also note that whatever the current composition of the economy, over time whatever has highest growth in the automated economy will be most of the economy, and nothing says the combination of AIs pursuing their desires wouldn't result in some sectors shrinking (and the AIs not caring).
  3. First of all, why would divesting work? Presumably even if lots of humans chose to divest, assuming that auto-corporations were sound businesses, there would exist hedge funds (whether human or automated or mixed) that would buy up the shares. (The companies could also continue existing even if their share prices fell, though likely the AI CEOs would care quite a bit about share price not tanking.) Secondly, a lot seems to be possible given (1) uncertainty about whether things will get bad and if so how (at first, economic growth jumped a lot and AI CEOs seemed great; it was only once AI control of the economy was near-universal and closed economic loops with no humans in them came to exist that there was a direct problem), (2) difficulties of coordinating, especially with no clear fire-alarm threshold and the benefits of racing in the short term (c.f. all the obvious examples of coordination failures like climate change mitigation), and (3) selection effects where AI-run things just grow faster and acquire more power and therefore even if most people / orgs / countries chose not to adopt, the few that do will control the future.

I agree that this exact scenario is unlikely, but I think this class of failure mode is quite plausible, for reasons I hope I've managed to spell out more directly above.

Note that all of this relies on the assumption that we get AIs of a particular power level, and of a particular goodharting level, and a particular agency/coherency level. The AIs controlling future Earth are not wildly superhuman, are plausibly not particularly coherent in their preferences and do not have goals that stretch beyond Earth, no single system is a singleton, and the level of goodharting is just enough that humans go extinct but not so extreme that nothing humanly-recognisable still exists (though the Blight implies that elsewhere in the story universe there are AI systems that differ in at least some of these). I agree it is not at all clear whether these are true assumptions. However, it's not obvious to me that LLMs (and in particular AIs using LLMs as subcomponents in some larger setup that encourages agentic behaviour) are not on track towards this. Also note that a lot of the actual language that many of the individual AIs see is actually quite normal and sensible, even if the physical world has been totally transformed. In general, LLMs being able to use language about maximising shareholder value exactly right (and even including social responsibility as part of it) does not seem like strong evidence for LLM-derived systems not choosing actions with radical bad consequences for the physical world.

Comment by L Rudolf L (LRudL) on Review: Amusing Ourselves to Death · 2022-12-24T14:17:11.697Z · LW · GW

Thank you for your comment! I'm glad you enjoyed the review.

Before you pointed it out, I hadn't made the connection between the type of thing that Postman talks about in the book and increasing cultural safety-ism. Another interesting take you might be interested in is by J. Storrs Hall in Where is my flying car? - he argues that increasing cultural safety-ism is a major force slowing down technological progress. You can read a summary of the argument in my review here (search for "perception" to jump to the right part of the review).

Comment by L Rudolf L (LRudL) on AI Risk Intro 1: Advanced AI Might Be Very Bad · 2022-09-12T09:45:52.257Z · LW · GW

That line was intended to (mildly humorously) make the point that we realise and are aware that there are many other serious risks in the popular imagination. Our central point is that AI x-risk is grand civilisational threat #1, so we wanted to lead with that, and since people think many other things are potential civilisational catastrophes (if not x-risks) we thought it made sense to mention those (and also implicitly put AI into the reference class of "serious global concern"). We discussed, and got feedback from several others, on this opener and while there was some discussion we didn't see any fundamental problem with it. The main consideration for keeping it was that we prefer specific and even provocative-leaning writing that makes its claims upfront and without apology (e.g. "AI is a bigger threat than climate change" is a provocative statement; if that is a relevant part of our world model, seems honest to point that out).

The general point we got from your comment is that we judged the way the tone of it comes across very wrongly. Thanks for this feedback; we've changed it. However, we're confused about the specifics of your point, and unfortunately haven't acquired any concrete model of how to avoid similar errors in the future apart from "be careful about the tone of any statements that even vaguely imply something about geopolitics". (I'm especially confused about how you got the reading that we equated the threat level from Putin and nuclear weapons, and it seems to me that the extent that it is "mudslinging" or "propaganda" seems to be the extent to which acknowledging that many people think Putin is a major threat is either of those things.)

In addition to the general tone, an additional thing we got wrong here was not sufficiently disambiguating between "we think these other things are plausible [or, in your reading, equivalent?] sources of catastrophe, and therefore you need a high bar of evidence before thinking AI is a greater one", versus "many people think these are more concrete and plausible sources of catastrophe than AI". The original intended reading was "bold" as in "socially bold, relative to what many people think", and therefore making points only about public opinion.

Correcting the previous mistake might have looked like:

"If human civilisation is destroyed this century, the most likely cause is advanced AI systems. This might sound like a bold claim to many, given that we live on a planet full of existing concrete threats like climate change, over ten thousand nuclear weapons, and Vladimir Putin"

Based on this feedback, however, we have now removed any comparison or mention of non-AI threats. For the record, the entire original paragraph is:

If human civilisation is destroyed this century, the most likely cause is advanced AI systems. This is a bold claim given that we live on a planet that includes climate change, over ten thousand nuclear weapons, and Vladimir Putin. However, it is a conclusion that many people who think about the topic keep coming to. While it is not easy to describe the case for risks from advanced AI in a single piece, here we make an effort that assumes no prior knowledge. Rather than try to argue from theory straight away, we approach it from the angle of what computers actually can and can’t do.

Comment by L Rudolf L (LRudL) on Review: Structure and Interpretation of Computer Programs · 2022-04-12T18:35:57.303Z · LW · GW

This is an interesting point, I haven't thought about the relation to SVO/etc. before! I wonder whether SVO/SOV dominance is a historical quirk, or if the human brain actually is optimized for those.

The verb-first emphasis of prefix notation like in classic Lisp is clearly backwards sometimes. Parsing this has high mental overhead relative to what it's expressing:

(reduce +
        (filter even?
               (take 100 fibonacci-numbers)))

I freely admit this is more readable:


Clojure, a modern Lisp dialect, solves this with threading macros. The idea is that you can write

(->> fibonacci-numbers
     (take 100)
     (filter even?)
     (reduce +))

and in the expressions after ->> the previous expression gets substituted as the last argument to the next.

Thanks to the Lisp macro system, you can write a threading macro even in a Lisp that doesn't have it (and I know that for example in Racket you can import a threading macro package even though it's not part of the core language).

As for God speaking in Lisp, we know that He at least writes it:

Comment by L Rudolf L (LRudL) on Review: Structure and Interpretation of Computer Programs · 2022-04-12T18:20:45.087Z · LW · GW

In my experience the sense of Lisp syntax being idiosyncratic disappears quickly, and gets replaced by a sense of everything else being idiosyncratic.

The straightforward prefix notation / Lisp equivalent of return x1 if n = 1 else return x2 is (if (= n 1) x1 x2). To me this seems shorter and clearer. However I admit the clarity advantage is not huge, and is clearly subjective.

(An alternative is postfix notation: ((= n 1) x1 x2 if) looks unnatural, though (2 (3 4 *) +) and (+ 2 (* 3 4)) aren't too far apart in my opinion, and I like the cause->effect relationship implied in representing "put 1, 2, and 3 into f" as (1 2 3 f) or (1 2 3 -> f) or whatever.)

Note also that since Lisp does not distinguish between statements and values:

  • you don't need return, and
  • you don't need a separate ternary operator when you want to branch in a value (the x if c else y syntax in Python for example) and for normal if.

I think Python list comprehensions (or the similarly-styled things in e.g. Haskell) are a good example of the "other way" of thinking about syntax. Guido van Rossum once said something like: it's clearer to have [x for x in l if f(x)] than filter(f, l). My immediate reaction to this is: look at how much longer one of them is. When filter is one function call rather than a syntax-heavy list comprehension, I feel it makes it clearer that filter is a single concept that can be abstracted out.

Now of course the Python is nicer because it's more English-like (and also because you don't have to remember whether the f is a condition for the list element to be included or excluded, something that took me embarrassingly long to remember correctly ...). I'd also guess that I might be able to hammer out Python list comprehensions a bit faster and with less mental overhead in simple cases, since the order in which things are typed out is more like the order in which you think of it.

However, I do feel the Englishness starts to hurt at some point. Consider this:

[x for y in l for x in y]

What does it do? The first few times I saw this (and even now sometimes), I would read it, backtrack, then start figuring out where the parentheses should go and end up confused about the meaning of the syntax: "x for y in l, for x in y, what? Wait no, x, for y in l, for x in y, so actually meaning a list of every x for every x in every y in l".

What I find clearer is something like:

(mapcat (lambda (x) x) l)


(reduce append l)

Yes, this means you need to remember a bunch of building blocks (filter, map, reduce, and maybe more exotic ones like mapcat). Also, you need to remember which position which argument goes in (function first, then collection), and there are no syntactic signposts to remind you, unlike with the list comprehension syntax. However, once you do:

  • they compose and mix very nicely (for example, (mapcat f l) "factors into" (reduce append (map f l))), and
  • there are no "seams" between the built-in list syntax and any compositions on top of them (unlike Python, where if you define your own functions to manipulate lists, they look different from the built-in list comprehension syntax).

I think the last point there is a big consideration (and largely an aesthetic one!). There's something inelegant about a programming language having:

  • many ways to write a mapping from values to values, some in infix notation (1+1) and some in prefix notation (my_function(val)), and others even weirder things (x if c else y);
  • expressions that may either reduce to a value (most things) or then not reduce to a value (if it's an if or return or so on);
  • a syntax style you extend in one way (e.g. prefix notation with def my_function(val): [...]) and others that you either don't extend, or extend in weird ways  (def __eq__(self, a, b): [...]).

Instead you can make a programming language that has exactly one style of syntax (prefix), exactly one type of compound expression (parenthesised terms where the first thing is the function/macro name), and a consistent way to extend all the types of syntax (define functions or define macros). This is especially true since the "natural" abstract representation of a program is a tree (in the same way that the "natural" abstract representation of a sentence is its syntax tree), and prefix notation makes this very clear: you have a node type, and the children of the node.

I think the crux is something like: do you prefer a syntax that is like a collection of different tools for different tasks, or a syntax that highlights how everything can be reduced to a tight set of concepts?

Comment by L Rudolf L (LRudL) on Competence/Confidence · 2021-11-22T00:33:08.314Z · LW · GW

Since some others are commenting about not liking the graph-heavy format: I really liked the format, in particular because having it as graphs rather than text made it much faster and easier to go through and understand, and left me with more memorable mental images. Adding limited text probably would not hurt, but adding lots would detract from the terseness that this presentation effectively achieves. Adding clear definitions of the terms at the start would have been valuable though.

Rather than thinking of a single example that I carried throughout as you suggest, I found it most useful to generate one or more examples as I looked at each graph (e.g. for the danger-zone graphs, in order: judging / software testing, politics, forecasting / medical diagnosis).

Comment by L Rudolf L (LRudL) on Review: Foragers, Farmers, and Fossil Fuels · 2021-09-03T23:29:21.729Z · LW · GW

Regarding the end of slavery: I think you make good points and they've made me update towards thinking that the importance of materialistic Morris-style models is slightly less and cultural models slightly more.

I'd be very interested to hear what were the anti-slavery arguments used by the first English abolitionists and the medieval Catholic Church (religion? equality? natural rights? utilitarian?).

Which, evidently, doesn't prevent the usual narrative from being valid in other places, that is, countries in which slavery was still well accepted finding themselves forced, first militarily, then technologically, and finally economically, to adapt or perish.

I think there's also another way for the materialistic and idealistic accounts to both be true in different places: Morris' argument is specifically about slavery existing when wage incentives are weak, and perhaps this holds in places like ancient Egypt and the Roman Empire, but had stopped holding in proto-industrial places like 16th-18th century western Europe. However I'm not aware of what specific factor would drive this.

One piece of evidence on whether economics or culture is more important would be comparing how many cases there are where slavery existed/ended in places without cultural contact but with similar economic conditions and institutions, to how many cases there are of slavery existing/ending in places with cultural contact but different economic conditions/institutions.

Comment by L Rudolf L (LRudL) on Review: Foragers, Farmers, and Fossil Fuels · 2021-09-03T22:59:02.657Z · LW · GW

Thank you for this very in-depth comment. I will reply to your points in separate comments, starting with:

According him, the end of the feudal system in England, and its turning into a modern nation-state, involved among other things the closing off and appropriation, by nobles as a reward from the kingdom, of the former common farmlands they farmed on, as well as the confiscation of the lands owned by the Catholic Church, which for all practical purposes also served as common farmlands. This resulted in a huge mass of landless farmers with no access to land, or only very diminished access, who in turn decades later became the proletarians for the newly developing industries. If that's accurate, then it may be the case that the Industrial Revolution wouldn't have happened had all those poor not have existed, since the very first industries wouldn't have been attractive compared to condition non-forcibly-starved farmers had.

This is very interesting and something I haven't seen before. Based on some quick searching, this seems to be referring to the Inclosure Acts (which were significant, affecting 1/6th of English land) and perhaps specifically this one, while the Catholic Church land confiscation was the 1500s one. My priors on this having a major effect are somewhat skeptical because:

  1. The general shape of English historical GDP/capita is a slight post-plague rise, followed by nothing much until a gradual rise in the 1700s and then takeoff in the 1800s. Likewise, skimming through this, there seem to be no drastic changes in wealth inequality around the time of the Inclosure Acts, though share of wealth held by the top 10% slightly rise in the late 1700s and personal estates (note: specifically excludes real estate) of farmers and yeomen slightly drop around 1700 before rebounding. Any pattern of more poor farmers must evade these statistics, either by being small enough, or by not being captured in these crude overall stats (which is very possible, especially if the losses for one set of farmers were balanced by gains for another).
  2. Other sources I've read support the idea that farmers in general prefer industrial jobs. It's not just Steven Pinker either; Vaclav Smil's Energy and Civilization (my review) has this passage:

Moreover, the drudgery of field labor in the open is seldom preferable even to unskilled industrial work in a factory. In general, typical factory tasks require lower energy expenditures than does common farm work, and in a surprisingly short time after the beginning of mass urban industrial employment the duration of factory work became reasonably regulated 

It's probably the case that it's easier to recruit landless farmers into industrial jobs, and I can imagine plausible models where farmers resist moving to cities, especially for uncertainty-avoidance / risk-aversion reasons. However, the effect of this, especially in the long term, seems limited by things like population growth in (already populous) cities, people having to move off their family farms anyways due to primogeniture, and people generally being pretty good at exploiting available opportunities. An exception might be if early industrialization was tenable only under a strict labor availability threshold that was met only because of the mass of landless farmers created by the English acts.

Comment by L Rudolf L (LRudL) on Review: Foragers, Farmers, and Fossil Fuels · 2021-09-03T21:59:23.435Z · LW · GW

Thanks for the link to Sarah Constantin's post! I remember reading it a long time ago but couldn't have found it again now if I had tried. It was another thing (along with Morris's book) that made me update towards thinking that historical gender norms are heavily influenced by technology level and type. Evidence that technology type variation even within farming societies had major impacts on gender norms also seems like fairly strong support for Morris' idea that the even larger variation between farming societies and foragers/industrialists can explain their different gender norms.

John Danaher's work looks relevant to this topic, but I'm not convinced that his idea of collective/individual/artificial intelligence as the ideal types of future axiology space is cutting it in the right way. In particular, I have a hard time thinking of how you'd summarize historical value changes as movement in the area spanned by these types.