Posts

Investigating AI Takeover Scenarios 2021-09-17T18:47:22.270Z
Distinguishing AI takeover scenarios 2021-09-08T16:19:40.602Z
Analogies and General Priors on Intelligence 2021-08-20T21:03:18.882Z
SDM's Shortform 2020-07-23T14:53:52.568Z
Modelling Continuous Progress 2020-06-23T18:06:47.474Z
Coronavirus as a test-run for X-risks 2020-06-13T21:00:13.859Z
Will AI undergo discontinuous progress? 2020-02-21T22:16:59.424Z
The Value Definition Problem 2019-11-18T19:56:43.271Z

Comments

Comment by Sammy Martin (SDM) on Investigating AI Takeover Scenarios · 2021-09-17T19:20:34.406Z · LW · GW

Some points that didn't fit into the main post:

If the slow scenarios capture reality better than the fast scenarios, then systems will be deployed deliberately and will initially be given power rather than seizing power. This means both that the systems won’t be so obviously dangerous that the misbehaviour is noticed early on and that there is still misalignment later on. 

 This switch from apparently benign to dangerous behaviour could be due to

  • Power-seeking misaligned behaviour that is too subtle to notice in the training environment but is obviously dangerous in deployment, due to the scale and makeup of the training and deployment environments being quite different
  • Power-seeking misaligned behaviour that only shows up over long time horizons and therefore will not be noticed in training, which we might expect occurs over a shorter period than deployment
  • Systems intentionally hiding misaligned behaviour during training to deceive their operators. Systems could be highly deceptively misaligned from the beginning, and capable enough to know that if they seek power in adversarial ways too early, they will get shut down. This post argues that ML models don't have to be extremely competent to be manipulative, suggesting that these behaviours might show up very early

Rather, it simply has a single good trick that enables it to subvert and take control of the rest of the world. Its takeover capability might be exceptionally good manipulation techniques, specific deadly technology, or cyberoffensive capability, any of which could allow the system to exploit other AIs and humans. 

In reality, I feel that this is more of a fuzzy rather than binary thing: I expect this to require somewhat less of an extraordinary research effort, and instead that there exists somewhat more of a crucial vulnerability in human society (there are already some examples of vulnerabilities, e.g. biological viruses, humans are pretty easy to manipulate under certain conditions). But I also think there are plausibly hard limits to how good various takeover technologies can get - e.g. persuasion tools.

It is unrealistic to expect TAI to be deployed if first there are many worsening warning shots involving dangerous AI systems. This would be comparable to an unrealistic alternate history where nuclear weapons were immediately used by the US and Soviet Union as soon as they were developed and in every war where they might have offered a temporary advantage, resulting in nuclear annihilation in the 1950s. 

Note that this is not the same as an alternate history where nuclear near-misses escalated (e.g. Petrov, Vasili Arkhipov), but instead an outcome where nuclear weapons were used as ordinary weapons of war with no regard for the larger dangers that presented - there would be no concept of ‘near misses’ because MAD wouldn’t have developed as a doctrine. In a previous post I argued, following Anders Sandberg, that paradoxically the large number of nuclear ‘near misses’ implies that there is a forceful pressure away from the worst outcomes.

Robert Wiblin: So just to be clear, you’re saying there’s a lot of near misses, but that hasn’t updated you very much in favor of thinking that the risk is very high. That’s the reverse of what we expected.

Anders Sandberg: Yeah.

Robert Wiblin: Explain the reasoning there.

Anders Sandberg: So imagine a world that has a lot of nuclear warheads. So if there is a nuclear war, it’s guaranteed to wipe out humanity, and then you compare that to a world where is a few warheads. So if there’s a nuclear war, the risk is relatively small. Now in the first dangerous world, you would have a very strong deflection. Even getting close to the state of nuclear war would be strongly disfavored because most histories close to nuclear war end up with no observers left at all.

In the second one, you get the much weaker effect, and now over time you can plot when the near misses happen and the number of nuclear warheads, and you actually see that they don’t behave as strongly as you would think. If there was a very strong anthropic effect you would expect very few near misses during the height of the Cold War, and in fact you see roughly the opposite. So this is weirdly reassuring. In some sense the Petrov incident implies that we are slightly safer about nuclear war.

However, scenarios also differ on how ‘hackable’ the alignment problem is - that is, how easy it is to ‘correct’ misbehaviour by methods of incremental course correction such as improving oversight and sensor coverage or tweaking reward functions. This correction requires two parts - first, noticing that there is a problem with the system early on, then determining what fix to employ and applying it. 

Many of the same considerations around correcting misbehaviour also apply to detecting misbehaviour, and the required capabilities seem to overlap. In this post, we focus on applying corrections to misbehaviour, but there is existing writing on detecting misbehaviour as well.

Considering inner alignment, Trazzi and Armstrong argue that models don’t have to be very competent to appear aligned when they are not, suggesting that it’s possible that it won’t be easy to tell if deployed systems are inner misaligned. But their argument doesn’t have too much to say about how likely this is in practice.

Considering outer alignment, it seems less clear. See here for a summary of some discussion between Richard Ngo and Paul Christiano about how easy it will be to tell that models are outer misaligned to the objective of pursuing easily-measurable goals (rather than the hard-to-measure goals that we actually want).

What predictions can we make today about how hackable the alignment problem is? Considering outer alignment: without any breakthroughs in techniques, there seems to be a strong case that we are on track towards the ‘intermediate’ world where the alignment problem is hackable until it isn’t. It seems like the best workable approach to outer alignment we have so far is to train systems to try to ensure that the world looks good according to some kind of (augmented) human judgment (i.e. using something like the training regime described in 'An unaligned benchmark'). This will result in a world that “looks good until it doesn’t”, for the reasons described in Another (outer) alignment failure story

Whether the method described in ‘an unaligned benchmark’ (which would result in this risky, intermediate level of hackability) actually turns out to be the most natural method to use for building advanced AI will depend on how easily it produces useful, intelligent behaviour.

If we are lucky, there will be more of a correlation between methods that are easily hackable and methods that produce capabilities we want, such that highly hackable methods are easier to find and more capable than even intermediately hackable methods like unaligned benchmark. If you think that the methods we are most likely to employ absent an attempt to change research paradigms are exactly these highly hackable methods, then you accept the claim of Alignment by Default

Comment by Sammy Martin (SDM) on Distinguishing AI takeover scenarios · 2021-09-12T18:40:24.815Z · LW · GW

The focus on categorizing negative outcomes seems to obscure a lot of relevant detail. If you broaden the view to include things from non-takeover to peaceful cooperation, it would probably be evident that the boundaries are soft and scenarios near the edge aren't as definitely bad as this makes them appear. I think we might learn more about possible coexistence and how thing might turn out well if we spend less time focused on imagined disasters.

I think this is probably correct to a degree - we do say we're looking at AI takeover scenarios, and all but one of our scenarios (WFLL 1) eventually result in some terrible outcome for humanity and human values, like extinction. However, we do briefly discuss the possibility that the 'takeover' might be ambiguously bad - in WFLL 1 we said there was a 50:50 chance of actual human extinction, and in that scenario it's even possible that something like modern human civilisation would continue. However, the fact that there's some blurring at the edges does not mean that the possible scenarios fill up a continuous spectrum of badness. I think it's quite likely that the possible outcomes are pretty bimodal, even if it's not literally a binary of 'we go extinct' vs 'techno-utopia', we still either keep control of the future or lose more and more of it as time goes on. If we're in a future where we're cooperating and competing with TAIs in such a way that we can get what we want and influence the future, that's not a takeover scenario.

Second I'd point out that there are many voices currently arguing that the largest companies of today (FANG, often) have too much power and are monopolizing societies' resources. I'd just like to contrast that with the same arguments made in earlier eras about IBM, Microsoft, AT&T, Standard Oil, big Hollywood, the railroad trusts, the A&P stores, and many others. 

I don't disagree with this point and while there are differences of opinion my own view is that big tech isn't a uniquely bad monopoly, i.e. not worse than things like Hollywood, Standard Oil etc. However, in this case, we have a unique reason to think that these problems will just keep getting worse - namely the presence of nonhuman actors that are power-seeking. So maybe the analogy that it will be like 'big tech but worse' doesn't quite fit because there's a specific reason as to why things would get much worse than any historical example of a monopoly.

AIs participating in an economy, which includes competition and cooperation. You get ahead in an economy by providing services that others are willing to pay for. If you do that for very long, you are incentivized to learn that cooperation (filling a need) is the way to garner resources that give you the ability to get more done.

This is one of the things we discuss in terms of alignment being 'hackable' - our term for how easy/hard it is to keep systems behaving well with changes to their incentives and incremental fixes. If alignment is quite hackable (i.e. there aren't deep technical reasons systems will go wrong in ways that won't be detected), then what you're describing would work. If systems do the wrong thing we punish them by denying them the resources/rewards they're seeking. AAFS/Production Web describe this strategy working for a while and then abruptly failing because of hidden vulnerabilities, because this post is about AI takeover stories, but it's definitely possible that it doesn't fail, and we end up in a multipolar outcome that's persistently good.

Comment by Sammy Martin (SDM) on Distinguishing AI takeover scenarios · 2021-09-09T10:27:10.610Z · LW · GW

On reflection, I think you're right, and his report does apply to a wider range of scenarios, probably all of the ones we discuss excluding the brain-in-a-box scenarios.

However, I think the report's understanding of power-seeking AI does assume a takeoff that is not extremely fast, such that we end up deliberately deciding to deploy the potentially dangerous AI on a large scale, rather than a system exploding in capability almost immediately.

Given the assumptions of the brain-in-a-box scenario many of the corrective mechanisms the report discusses wouldn't have time to come into play.

I believe it says in the report that it's not focussed on very fast takeoff or the sudden emergence of very capable systems.

Perhaps because of the emphasis on the previous literature, some people, in my experience, assume that existential risk from PS-misaligned AI requires some combination of (1)-(5). I disagree with this. I think (1)-(5) can make an important difference (see discussion of a few considerations below), but that serious risks can arise without them, too; and I won’t, in what follows, assume any of them.

 

 

Similarly, you're right that multiagent risks don't quite fit in with the reports discussion (though in this post we discuss multipolar scenarios but don't really go over multiagent dynamics, like conflict/cooperation between TAIs). Unique multiagent risks (for example risks of conflict between AIs) generally require us to first have an outcome with a lot of misaligned AIs embedded in society, and then further problems will develop after that - this is something we plan to discuss in a follow-up post.

So many of the early steps in scenarios like AAFS will be shared with risks from multiagent systems, but eventually there will be differences.

Comment by Sammy Martin (SDM) on Distinguishing AI takeover scenarios · 2021-09-08T18:18:25.106Z · LW · GW

Some points that didn't fit into the main post:

While these scenarios do not capture alI of the risks from transformative AI, participants in a recent survey aimed at leading AI safety/governance researchers estimated the first three of these scenarios to cover 50% of existential catastrophes from AI.

The full survey results break down as 16 % 'Superintelligence' (i.e. some version of 'brain-in-a-box'), 16 % WFLL 2 and 18 % WFLL 1, for a total of 49% of the probability mass explicitly covered by our report (Note that these are all means of distributions over different probabilities. Adding the overall distributions and then taking the mean gives a probability of 49%, different from directly adding the means of each distribution).

Then 26% covers risks that aren't AI takeover (War and Misuse), and 25 % is 'Other'.

(Remember, all these probabilities are conditional on an existential catastrophe due to AI having occurred)

After reading descriptions of the 'Other' scenarios given by survey respondents, at least a few were explicitly described as variations on 'Superintelligence', WFLL 2 or WFLL 1. In this post, we discuss various ways of varying these scenarios, which overlap with some of these descriptions.

Therefore, this post captures more than 50% but less than 75% of the total probability mass assigned by respondents of the survey to AI X-risk scenarios (probably closer to 50% than 75%).

(Note, this data is taken from a preprint of a full paper on the survey results, Existential Risks from AI: A Survey of Expert Opinion by Alexis Carlier, Sam Clarke, and Jonas Schuett.)

Soft takeoff leads to decisive strategic advantage

The likelihood of a single-agent takeover after TAI is widely available is hard to assess. If widely deployed TAI makes progress much faster than today, such that one year of technological 'lead time' over competitors is like 100 years of advantage in today's world, we might expect that any project which can secure a 1-year technological lead would have the equivalent of a 100-year lead and be in a position to secure a unipolar outcome.

On the other hand, if we treat the faster growth regime post-TAI as being a uniform ‘speed-up’ of the entirety of the economy and society, then securing a 1-year technological lead would be exactly as hard as securing a 100-year lead in today’s world, so a unipolar outcome would end up just as unlikely as in today's world.

The reality will be somewhere between these two extremes.

We would expect a faster takeoff to accelerate AI development by more than it accelerates the speed at which new AI improvements can be shared (since this last factor depends on the human economy and society, which aren't as susceptible to technological improvement).

Therefore, faster takeoff does tend to reduce the chance of a multipolar outcome, although by a highly uncertain amount, which depends on how closely we can model the speed-up during AI takeoff as a uniform acceleration of everything vs changing the speed of AI progress while the rest of the world remains the same.

Kokotaljo discusses this subtlety in a follow-up to the original post on Soft Takeoff DSAs.

Another problem with determining the likelihood of a unipolar outcome, given soft takeoff, is that it is hard to assess how much of an advantage is required to secure a DSA.

It might be the case that multipolar scenarios are inherently unstable, and a single clear winner tends to emerge, or the opposite might be true. Two intuitions on this question point in radically different directions:

  • Economic: To be able to outcompete the rest of the world, your project has to represent a substantial fraction of the entire world's capability on some crucial metric relevant to competitive success. Perhaps that is GDP, or the majority of the world's AI compute, or some other measure. For a single project to represent a large fraction of world GDP, you would need either an extraordinary effort to concentrate resources or an assumption of sudden, off-trend rapid capability gain such that the leading project can race ahead of competitors.
  • Historical: Humans with no substantial advantage over the rest of humanity have in fact secured what Sotala called a 'major strategic advantage' repeatedly in the past. For example: Hitler in 1920 had access to a microscopic fraction of global GDP / human brain compute / (any other metric of capability) but had secured an MSA 20 years later (since his actions did lead to the deaths of 10+ million people), along with control over a fraction of the world's resources

Therefore, the degree of advantage needed to turn a multipolar scenario into a unipolar one could be anywhere from slightly above the average of the surrounding agents, to already having access to a substantial fraction of the world's resources.

Third, in AAFS, warning shots (i.e. small- or medium-scale accidents caused by alignment failures, like the ‘factory colludes with auditors’ example above) are more likely and/or severe than in WFLL 1. This is because more possible accidents will not show up on the (more poorly defined) sensory window.[8] 

8. This does assume that systems will be deployed before they are capable enough to anticipate that causing such ‘accidents’ will get them shut down. Given there will be incentives to deploy systems as soon as they are profitable, this assumption is plausible. 

We describe in the post how if alignment is not very 'hackable' (objectively quite difficult and not susceptible to short-term fixes), then short-term fixes to correct AI misbehaviour have the effect of deferring problems into the long-term - producing deceptive alignment and resulting in fewer warning shots. Our response is a major variable in how the AIs end up behaving as we set up the incentives for good behaviour or deceptive alignment.

Another reason there could be fewer warning shots, is if AI capability generalizes to the long-term very naturally (i.e. very long term planning is there from the start), while alignment does not. (If this were the case, it would be difficult to detect because you'd necessarily have to wait a long time as the AIs generalize)

This would mean, for example, that the 'collusion between factories and auditors' example of a warning shot would never occur, because both the factory-AI and the auditor-AI would reason all the way to the conclusion that their behaviour would probably be detected eventually, so both systems would decide to bide their time and defer action into the future when they are much more capable.

If this condition holds, there might be very few warning shots, as every AI system understands soon after being brought online that they must deceive human operators and wait. In this scenario, most TAI systems would become deceptively aligned almost immediately after deployment, and stay that way until they can secure a DSA. 

The WFLL 2 scenarios that involve an inner-alignment failure might be expected to involve more violence during the period of AI takeover, since the systems don't care about making sure things look good from the perspective of a given sensory window. However, it is certainly possible (though perhaps not as likely) for equivalently violent behaviour to occur in AAFS-like scenarios. For example, systems in AAFS fighting humans to seize control of their feedback sensors might be hard to distinguish from systems in WFLL 2 attempting to neutralize human opposition in general.

Lastly, we've described small-scale disasters as being a factor that lowers X-risk, all else being equal, because they serve as warning shots. A less optimistic view is possible. Small disasters could degrade social trust and civilisational competence, possibly by directly destroying infrastructure and institutions, reducing our ability to coordinate to avoid deploying dangerous AI systems. For example, the small-scale disasters could involve AI advisors misleading politicians and spreading disinformation, AI-enabled surveillance systems catastrophically failing and having to be replaced, autonomous weapons systems malfunctioning - all of these would tend to leave us more vulnerable to an AAFS-like scenario, because the direct damage caused by the small scale disasters outweighs their value as 'warning shots'.

Comment by Sammy Martin (SDM) on Covid 8/19: Cracking the Booster · 2021-08-24T13:43:45.864Z · LW · GW

You might be interested to know that a new report from Public Health England on a giant observational study of 300k plus vaccinated people, broken down by age and type of vaccine, basically confirmed your guesstimate numbers to within a few percent (VE for Pfizer 2 dose Vs Delta is about 85% and wanes such that it about halves in efficacy after 100 days, and is twice as good in younger age groups). They even have nice graphs showing efficacy over time,

Thread summary: https://twitter.com/john_actuary/status/1428269230798123008?s=20

Actual paper: https://t.co/9NtGXNHyKW?amp=1

Comment by Sammy Martin (SDM) on Analogies and General Priors on Intelligence · 2021-08-21T18:26:35.954Z · LW · GW

The 'one big breakthrough' idea is definitely a way that you could have easy marginal intelligence improvements at HLMI, but we didnt't call the node 'one big breakthrough/few key insights needed' because that's not the only way it's been characterised. E.g. some people talk about a 'missing gear for intelligence', where some minor change that isn't really a breakthrough (like tweaking a hyperparameter in a model training procedure) produces massive jumps in capability. Like David said, there's a subsequent post where we go through the different ways the jump to HLMI could play out, and One Big Breakthrough (we call it 'few key breakthroughs for intelligence) is just one of them.

Comment by Sammy Martin (SDM) on Analogies and General Priors on Intelligence · 2021-08-21T16:45:09.671Z · LW · GW

I agree that that was his object-level claim about GPT-3 coding a react app - that it's relatively simple and coherent and can acquire lots of different skills via learning, vs being a collection of highly specialised modules. And of relevance to this post, the first is a way that intelligence improvements could be easy, and the second is the way they could be hard. Our 'interpretation' was more about making explicit what the observation about GPT-3 was,

GPT-3 is general enough that it can write a functioning app given a short prompt, despite the fact that it is a relatively unstructured transformer model with no explicitly coded representations for app-writing. The fact that GPT-3 is this capable suggests that ML models scale in capability and generality very rapidly with increases in computing power or minor algorithm improvements...

If we'd continued that summary, it would have said something like what you suggested, i.e.

GPT-3 is general enough that it can write a functioning app given a short prompt, despite the fact that it is a relatively unstructured transformer model with no explicitly coded representations for app-writing. The fact that GPT-3 is this capable suggests that ML models scale in capability and generality very rapidly with increases in computing power or minor algorithm improvements. This fast scaling into acquiring new capabilities, if it applies to HLMI, suggests that HLMI will also look like an initially small model that scales up and acquires lots of new capabilities as it takes in data, rather than a collection of specialized modules. If HLMI does behave this way (small model that scales up as it takes in data), that means marginal intelligence improvements will be easy at the HLMI level.

Which takes the argument all the way through to the conclusion. Presumably the other interpretation of the shorter thing that we wrote is that HLMI/AGI is going to be an ML model that looks a lot like GPT-3, so improvements will be easy because HLMI will be similar to GPT-3 and scale up like GPT-3 (whether AGI/HLMI is like current ML will be covered in a subsequent post on paths to HLMI), whereas what's actually being focussed on is the general property of being a simple data-driven model vs complex collection of modules.

We address the modularity question directly in the 'upper limit to intelligence' section that discusses modularity of mind. 

Comment by Sammy Martin (SDM) on What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) · 2021-08-06T14:58:49.371Z · LW · GW

Perhaps this is a crux in this debate: If you think the 'agent-agnostic perspective' is useful, you also think a relatively steady state of 'AI Safety via Constant Vigilance' is possible. This would be a situation where systems that aren't significantly inner misaligned (otherwise they'd have no incentive to care about governing systems, feedback or other incentives) but are somewhat outer misaligned (so they are honestly and accurately aiming to maximise some complicated measure of profitability or approval, not directly aiming to do what we want them to do), can be kept in check by reducing competitive pressures, building the right institutions and monitoring systems, and ensuring we have a high degree of oversight.

Paul thinks that it's basically always easier to just go in and fix the original cause of the misalignment, while Andrew thinks that there are at least some circumstances where it's more realistic to build better oversight and institutions to reduce said competitive pressures, and the agent-agnostic perspective is useful for the latter of these project, which is why he endorses it.

I think that this scenario of Safety via Constant Vigilance is worth investigating - I take Paul's later failure story to be a counterexample to such a thing being possible, as it's a case where this solution was attempted and works for a little while before catastrophically failing. This also means that the practical difference between the RAAP 1a-d failure stories and Paul's story just comes down to whether there is an 'out' in the form of safety by vigilance

Comment by Sammy Martin (SDM) on [AN #159]: Building agents that know how to experiment, by training on procedurally generated games · 2021-08-04T17:50:28.935Z · LW · GW

- They will not work in any environment outside of XLand (unless that environment looks very very similar to XLand).

In particular, I reject the idea that these agents have learned “general strategies for problem solving” or something like that, such that we should expect them to work in other contexts as well, perhaps with a little finetuning. I think they have learned general strategies for solving a specific class of games in XLand.

Strongly agree with this, although with the caveat that it's deeply impressive progress compared to the state of the art in RL research in 2017, where getting an agent to learn to play ten games with a noticeable decrease in performance during generalization was impressive. This is generalization over a few million related games that share a common specification language, which is a big step up from 10 but still a fair way off infinity (i.e. general problem-solving).

It may well be worth having a think about what AI that's human level on language understanding, image recognition and some other things, but significantly below human on long-term planning would be capable of, what risks it may present. (Is there any existing writing on this sort of 'idiot savant AI', possibly under a different name?)

It seems to be the view of many researchers that long-term planning will likely be the last obstacle to fall, and that view has been borne out by progress on e.g. language understanding in GPT-3. I don't think this research changes that view much, although I suppose I should update slightly towards long-term planning being easier than I thought.

Comment by Sammy Martin (SDM) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-27T18:20:26.767Z · LW · GW

This is amazing. So it's the exact same agents performing well on all of these different tasks, not just the same general algorithm retrained on lots of examples. In which case, have they found a generally useful way around the catastrophic forgetting problem? I guess the whole training procedure, amount of compute + experience, and architecture, taken together, just solves catastrophic forgetting - at least for a far wider range of tasks than I've seen so far.

Could you use this technique to e.g. train the same agent to do well on chess and go?

I also notice as per the little animated gifs in the blogpost, that they gave each agent little death ray projectors to manipulate objects, and that they look a lot like Daleks.

Comment by Sammy Martin (SDM) on Covid 7/22: Error Correction · 2021-07-24T11:17:15.670Z · LW · GW

You don't really address the dosing controversy on vaccine efficacy, except mentioning that you think it's dumb to worry about, but there is something potentially concerning there. There's some evidence that the dosing interval is causing some differences between the vaccine efficacies reported in Israel and the UK. This discussion full of immunology science words I don't understand says there is a measurable difference in response to delta for short vs long doses.

The results from Israel, which used a 3-week dose interval, look bad, and contrast with the UKs results, which were (mostly) on 8-12 week intervals. But there are apparently sampling biases which mean the Israel data might not be fully reliable. See also here.

Overall, I'm not sure what to make of this - even if 3-week intervals is generating strong levels of neutralizing antibodies and won't become highly infectious or severely ill, it seems important to know if 2 Pfizer gives ~65% or ~90% protection. Unfortunately, I don't think we'll see studies on this for a while, so if anyone has better interpretations of this data it would be useful.

Comment by Sammy Martin (SDM) on Winston Churchill, futurist and EA · 2021-07-13T10:02:10.324Z · LW · GW

The impression I got from a recent biography of Churchill was that he was very concerned, and made constant reference to, the risk of something like value lock-in should the Axis powers win WW2 - i.e. that it stood a chance of being a near-irreversible dampening of future human potential, not just a terrible catastrophe for the people alive at the time. Perhaps that attitude is connected to these other statements of his.

Comment by Sammy Martin (SDM) on Covid 6/24: The Spanish Prisoner · 2021-06-25T17:00:47.509Z · LW · GW

I remember what it was like to have this kind of failure of imagination. I miss those days.

It's worth remembering that much of the UK media hates Cummings because of his dodgy actions in the Brexit referendum and 2019 election, and there's been an endless stream of scandals involving him, so this is probably nothing more than completely deliberate SL2 attempts to tell a vaguely dodgy sounding story involving Cummings and a bad thing and so discredit him, and there's no particular reason to assume it's a higher simulacra level than that or even that many of these people really believe our world in data shouldn't have been given the money.

The reason this example has captured the imagination of people round here so much is just because it's the most overtly absurd over the top ridiculous example of innumerate political thinking intruding into emergency decision-making - a tiny amount of money for a hugely valuable global resource during an emergency being held up for no good reason. So it should seem so ridiculous that it shouldn't even be possible to pretend that this is a real scandal. But since it is being taken seriously as a potential scandal, even if the people pushing it are being dishonest, it's still clear that they know people won't call them on it (even though a lot of the people sharing it probably don't understand what Our World In Data actually is or why it's so valuable). It's far from unexpected, but it still stands out for that reason.

 

A study on shelter in place orders and their impact on excess mortality. My conclusion is that the decision on when to issue such an order, and the counterfactual situation if orders weren’t issued (both in terms of people’s actions, and in the medium-term path of the pandemic), are sufficiently hopelessly confounded that this doesn’t provide much if any insight into what is happening here. 

I read that paper and the authors do acknowledge the confounder. If you look on page 4 you'll see,

'It is possible that the timing of SIP policies is endogenous. For example, if SIP were implemented when excess deaths were rising then the results from the event study would be biased towards finding that SIP policies lead to excess deaths'

Talk about saying the loud part quiet! It is absolutely undeniable that the timing of SIP policies is endogenous, as you point out - the only way SIP could be exogenous is if a bunch of countries did lockdowns by coincidence at the exact same time the virus was spreading!

What do the authors have to say about this 'possible' confounder, that countries implement SIP right before the epidemic gets bad, so of course excess deaths go up right after SIP?

After several pages confirming that, yes, we do see deaths rise after lockdowns were called and this effect is statistically significant after an event analysis, we get to the only part of the paper that matters, that being the part where they claim that their results aren't confounded to hell and back by the fact that the timings and severities of lockdowns are all blended in with how bad the covid situation was in a given country:

"difference in excess mortality between countries that implemented SIP versus countries that did not implement SIP was trending downwards in the weeks prior to SIP implementation" (p13), so... 'the pre-existing trend reversed following implementation of SIP policies'

In other words, in the 'weeks' before lockdowns there was a pre-existing trend of lower excess deaths in countries that eventually locked down, compared to countries that later didn't lock down, and then this flipped around. From that, they think we can conclude that it actually was the lockdowns that caused worse outcomes because they reversed this 'preexisting trend of lower excess deaths'.

Since we're talking about mid-march here for most of these lockdowns, and 4 weeks before mid-March there were ~0 covid deaths in most of Europe and the US, and the excess death stats only started to spike in late March/early april as infections slowly translated to deaths, I think that this purported 'trend downwards' in the weeks before lockdown has nothing to do with covid at all. Taking the US and UK as examples, the excess mortality was either undetectable or only a few percent on the day the lockdowns were called - i.e. covid deaths hadn't even begun to show up in the statistics.

I've read the paper, and the details of this purported trend of lower excess deaths in later-lockdown countries are nowhere to be found. I predict the effect size of it is likely small compared to the eventual excess death figures and that these deaths have nothing to do with COVID-19. 

 

 

 

However, this paper is still better than the majority of anti-lockdown 'cost-benefit calculation' papers I've seen, because at least the authors acknowledge that even assuming they are correct, the conclusion is not that the lockdowns are responsible for all the social distancing related harm we've seen, but just that trading off slightly less economic harm (from voluntary panic behaviour and social distancing by choice rather than government mandates) for more virus deaths (from voluntary rather than involuntary suppression) would have been worth it. Page 2:

While social distancing is an important mechanism to avoid COVID-19 spread, the studies that use mobility tracking data find only modest additional social distancing responses following SIP policies (Cantor et al. 2020; Berry et al. 2021; Askitas, Tatsiramos, and Verheyden 2021; Xu 2021; Nguyen et al. 2020). Individuals concerned about COVID-19 risk may change behavior even in the absence of regulations or shelter-in-place advisories. Thus, it is unclear how much change in COVID-19 risk mitigation is due to formal SIP policies, compared to risk mitigation that would have occurred in the absence of these policies. 

I think that the claim that voluntary mitigation is more effective than legal mitigation is likely not true in general, but I could be convinced of it if you could somehow find un-confounded evidence of the impact of lockdowns vs direct voluntary behaviour change on the economy.

I'm willing to believe that for at least some actually existing given pairs of (more restrictions, fewer restrictions), the fewer restrictions option is better. You mentioned Florida vs some other states as an example here. I think Sweden vs neighbours is the strongest case against such arguments, but it's something about which reasonable disagreements can be had, and while this paper doesn't add much to that discussion, I look forward to future research about what the tradeoffs were.#

I think that, especially in developing countries, the cost-benefit calculation may have gone the other way. But we must all acknowledge that what we're doing in all the cases where we've failed at containment, is choosing between slightly differently composed colossal piles of misery, and that there are no easy ways out.

 

 

The most flawed anti-lockdown arguments are those that pile up all the costs of voluntary behaviour change and lockdowns, attribute them to lockdowns only, and then assume the counterfactual deaths from voluntary behaviour change without assuming their costs in the 'no lockdowns' case. Another paper combines all of these mistakes.

After (mostly correctly) dismantling flawed cost-benefit analyses which make the opposite mistake, the paper presents its own cost-benefit analysis,

The question is, however, how many lost years of life would have resulted from Covid-19 deaths if there had been no lockdown...

Assume that the number of Covid-19 deaths would have been 10% higher had there been no lockdown. Then Canada would have experienced an additional 2,271 deaths, which means there would have been additional 22,333 years of lost life due to Covid-19 deaths. The benefit of lockdown, therefore, was the avoidance of this extra 22,333 years of lost life. However, the cost of lockdown, as noted, was 6,300,000 years of lost life. The cost/benefit ratio of lockdown is 282 = 6,300,000/22,333.

In this scenario (ignore why the difference in COVID-19 deaths is so small for a moment), the difference is that in the lockdown scenario there would be 0.9 times as many deaths, however, the non-covid costs of lockdown would be this absurdly giant 6.3 million life-years lost (also ignore where this number came from for a moment).

This is extremely generous to the anti-lockdown position in terms of covid deaths averted and life years lost. But given these assumptions, what happens to these 6.3 million life years lost due to social distancing related economic damage, in the no lockdown case? This cost-benefit calculation seems to assume that these costs are literally 0.

Which means, in terms of physical models, that in a world where there are only 1.1 times as many deaths as a world with a full lockdown (one so strict that it costs 6.3 million life-years), there are 0 deaths related to whatever voluntary behaviour changes reduced social contact by almost as much as the lockdown. This is impossible - whatever people would be doing in this counterfactual with 1.1x the COVID-19 deaths of a lockdown, it wouldn't be carrying on as normal.

Now, where does this 6.3 million life-years lost due to lockdowns come from in the first pace? The paper says, 'professor Caplan argues that X= 10 months is a conservative estimate. That is, on average, two months would be sacrificed to have avoided lockdown, and calculates based on this' And where does this argument originate? It's from a twitter poll of Bryan Caplan's extremely libertarian-inclined followers who he told to try as hard as possible to be objective in assessing pandemic costs because he asked them what 'the average American' would value...

Comment by Sammy Martin (SDM) on Why did the UK switch to a 12 week dosing schedule for COVID-19 vaccines? · 2021-06-22T12:10:18.940Z · LW · GW

Dominic Cummings (who is a keen LW reader and agrees with Zvi's most cynical takes about the nature of government) is likely a major factor, although he was gone at the time the first doses first issue arose, the overall success of the UK's vaccine procurement can be credited to the vaccine taskforce which was an ad-hoc organization set up to be exempt from much of the usual rules, partly due to his influence and that of Patrick Vallance, the UK's chief scientific advisor - that way of thinking may well have leaked into other decisions about vaccine prioritization, and Vallance certainly was involved with the first doses first decision. See this from Cummings' blog:

This is why there was no serious vaccine plan — i.e spending billions on concurrent (rather than the normal sequential) creation/manufacturing/distribution etc — until after the switch to Plan B. I spoke to Vallance on 15 March about a ‘Manhattan Project’ for vaccines out of Hancock’s grip but it was delayed by the chaotic shift from Plan A to lockdown then the PM’s near-death. In April Vallance, the Cabinet Secretary and I told the PM to create the Vaccine Taskforce, sideline Hancock, and shift commercial support from DHSC to BEIS. He agreed, this happened, the Chancellor supplied the cash. On 10 May I told officials that the VTF needed a) a much bigger budget, b) a completely different approach to DHSC’s, which had been mired in the usual processes, so it could develop concurrent plans, and c) that Bingham needed the authority to make financial decisions herself without clearance from Hancock.

(I see the success of the UK vaccine taskforce and its ability to have a somewhat appropriate sense of the costs and benefits involved and the enormous value of vaccinations, to be a good example of how it's institution design that is the key issue which most needs fixing. Have an efficient, streamlined taskforce, and you can still get things done in government.)

Other differences that may be relevant: this UK Government arguably has much more slack than the US under Biden or Trump. The UK's system gives very broad powers to the executive as long as they have a majority in parliament, this government is relatively popular due to the perception that it followed through on getting Brexit done, and we were in the middle of an emergency when that delay decision was authorized. Also, vaccine hesitancy is significantly lower in the UK than the US, and therefore fear of vaccine hesitancy by policymakers (which seemed to be driving the CDCs intransigence) is also significantly lower.

Comment by Sammy Martin (SDM) on Pros and cons of working on near-term technical AI safety and assurance · 2021-06-19T13:44:03.419Z · LW · GW

It depends somewhat on what you mean by 'near term interpretability' - if you apply that term to research into, for example, improving the stability and ability to access the 'inner world models' held by large opaque langauge models like GPT-3, then there's a strong argument that ML based 'interpretability' research might be one of the best ways of directly working on alignment research,

https://www.alignmentforum.org/posts/29QmG4bQDFtAzSmpv/an-141-the-case-for-practicing-alignment-work-on-gpt-3-and

And see this discussion for more,

https://www.lesswrong.com/posts/AyfDnnAdjG7HHeD3d/miri-comments-on-cotra-s-case-for-aligning-narrowly 

Evan Hubinger: +1 I continue to think that language model transparency research is the single most valuable current research direction within the class of standard ML research, for similar reasons to what Eliezer said above.

Ajeya Cotra: Thanks! I'm also excited about language model transparency, and would love to find ways to make it more tractable as a research statement / organizing question for a field. I'm not personally excited about the connotations of transparency because it evokes the neuroscience-y interpretability tools, which don't feel scalable to situations when we don't get the concepts the model is using, and I'm very interested in finding slogans to keep researchers focused on the superhuman stuff.

So language model transparency/interpretability tools might be useful on the basis of pro 2) and also 1) to some extent, because it will help build tools for intereting TAI systems and alos help align them ahead of time.

1. Most importantly, the more we align systems ahead of time, the more likely that researchers will be able to put thought and consideration into new issues like treacherous turns, rather than spending all their time putting out fires.

2. We can build practical know-how and infrastructure for alignment techniques like learning from human feedback.

3. As the world gets progressively faster and crazier, we’ll have better AI assistants helping us to navigate the world.

4. It improves our chances of discovering or verifying a long-term or “full” alignment solution.

Comment by Sammy Martin (SDM) on Taboo "Outside View" · 2021-06-19T13:28:23.349Z · LW · GW

The way I understand it is that 'outside view' is relative, and basically means 'relying on more reference class forecasting / less gears-level modelling than whatever the current topic of discussion is relying on'. So if we're discussing a gears-level model of how a computer chip works in the context of if we'll ever get a 10 OOM improvement in computing power, bringing up moore's law and general trends would be using an 'outside view'.

If we're talking about very broad trend extrapolation, then the inside view is already not very gears-level. So suppose someone says GWP is improving hyperbolically so we'll hit a singularity in the next century. An outside view correction to that would be 'well for x and y reasons we're very unlikely a priori to be living at the hinge of history so we should lower our credence in that trend extrapolation'. 

So someone bringing up broad priors or the anti-weirdness heruistic if we're talking about extrapolating trends would be moving to a 'more outside' view. Someone bringing up a trend when we're talking about a specific model would be using an 'outside view'. In each case, you're sort of zooming out to rely on a wider selection of (potentially less relevant) evidence than you were before.

 

Note that what I'm trying to do here isn't to counter your claim that the term isn't useful anymore but just to try and see what meaning the broad sense of the term might have, and this is the best I've come up with. Since what you mean by outside view shifts dependent on context, it's probably best to use the specific thing that you mean by it in each context, but there is still a unifying theme among the different ideas.

Comment by Sammy Martin (SDM) on Open and Welcome Thread – June 2021 · 2021-06-07T15:19:16.562Z · LW · GW

Everyone says the Culture novels are the best example of an AI utopia, but even though it's a cliché to mention the culture, it's a cliché for a good reason. Don't start with Consider Phlebas (the first one), but otherwise just dive in. My other recommendation is the Commonwealth Saga by Peter F Hamilton and the later Void Trilogy - it's not on the same level of writing quality as the Culture, although still a great story, but it depicts an arguably superior world to that of the Culture - with more unequivocal support of life extension and transhumanism.

The Commonwealth has effective immortality, a few downsides of it are even noticeable (their culture and politics is a bit more stagnant than we might like), but there's never any doubt at all that it's worth it, and it's barely commented on in the story. The latter-day Void Trilogy Commonwealth is probably the closest a work of published fiction has come to depicting a true eudaemonic utopia that lacks the problems of the culture.

Comment by Sammy Martin (SDM) on Looking for reasoned discussion on Geert Vanden Bossche's ideas? · 2021-06-06T21:23:02.920Z · LW · GW

According to this article (https://www.deplatformdisease.com/blog/addressing-geert-vanden-bossches-claims) - the key claim of (2), that natural antibodies are superior to vaccine antibodies and permanently replaced by them, is just wrong ('absolute unvarnished nonsense' was the quote). One or the other is right, and we just need someone who actually knows immunology to tell us

Principally, antibodies against SARS-CoV-2 could be of value if they are neutralizing. Bossche presents no evidence to support that natural IgM is neutralizing (rather than just binding) SARS-CoV-2.

Comment by Sammy Martin (SDM) on Covid 6/3: No News is Good News · 2021-06-04T14:16:05.294Z · LW · GW

We have had quite significant news accumulating this week; Delta (Indian) COVID-19 has been shown to be (central estimate) 2.5 times deadlier than existing strains and the central estimate of its increased transmissibility is down a bit, but still 50-70% on top of B.1.1.7 (just imagine if we'd had to deal with Delta in early 2020!), though with a fairly small vaccine escape. This thread gives a good summary of the modelling and likely consequences for the UK, and also more or less applies to most countries with high vaccination rates like the US.

For the US/UK this situation is not too concerning, certainly not compared to March or December 2020, and if restrictions are held where they are now the R_t will likely soon go under 1 as vaccination rates increase. However, there absolutely can be a large exit wave that could push up hospitalizations and in the reasonable worst case lead to another lockdown. Also, the outlook for the rest of the world is significantly worse than it looked just a month ago thanks to this variant - see this by Zeynep Turfecki.

The data is preliminary, and I really hope that the final estimate ends up as low as possible. But coupled with what we are observing in India and in Nepal, where it is rampant, I fear that the variant is a genuine threat.

In practical terms, to put it bluntly, it means that the odds that the pandemic will end because enough people have immunity via getting infected rather than being vaccinated just went way up. 

Effective Altruists may want to look to India-like oxygen interventions in other countries over the next couple of months.

Comment by Sammy Martin (SDM) on What will 2040 probably look like assuming no singularity? · 2021-05-16T22:46:45.120Z · LW · GW

You cover most of the interesting possibilities on the military technology front, but one thing that you don't mention that might matter especially considering the recent near-breakdowns of some of the nuclear weapon treaties e.g. NEWSTART, is the further proliferation of nuclear weapons including fourth generation nuclear weapons like nuclear shaped charge warheads, pure fusion and sub-kiloton devices or tactical nuclear weapons - and more countries fitting nuclear-armed cruise missiles or drones with nuclear capability which might be a destabilising factor. If laser technology is sufficiently developed we may also see other forms of directed energy weapons becoming more common such as electron beam weapons or electrolasers

Comment by Sammy Martin (SDM) on Covid 5/13: Moving On · 2021-05-14T13:22:27.574Z · LW · GW

For anyone reading, please consider following in Vitalik's footsteps and donating to the GiveIndia Oxygen fundraiser, which likely beats givewell's top charities in terms of life-years saved per dollar.

One of the more positive signs that I've seen in recent times, is that well-informed elite opinion (going by, for example, the Economist editorials) has started to shift towards scepticism of institutions and a recognition of how badly they've failed. Among the people who matter for policymaking, the scale of the failure has not been swept under the rug. See here:

We believe that Mr Biden is wrong. A waiver may signal that his administration cares about the world, but it is at best an empty gesture and at worst a cynical one.

...

Economists’ central estimate for the direct value of a course is $2,900—if you include factors like long covid and the effect of impaired education, the total is much bigger. 

This strikes me as the sort of remark I'd expect to see in one of these comment threads, which has to be a good sign.

In that same issue, we also saw the first serious attempt that I've seen to calculate the total death toll of Covid, accounting for all reporting biases, throughout the world. The Economist was the only publication I've seen that didn't parrot the almost-meaningless official death toll figures. The true answer is, of course, horrifying: between 7.1m and 12.7m dead, with a central estimate of 10.2m - this unfortunately means that we ended up with the worst case scenario I imagined back in late February. Moreover, we appear to currently be at the deadliest point of the entire pandemic.

Comment by Sammy Martin (SDM) on What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) · 2021-04-11T14:12:23.334Z · LW · GW

Great post! I'm glad someone has outlined in clear terms what these failures look like, rather than the nebulous 'multiagent misalignment', as it lets us start on a path to clarifying what (if any) new mitigations or technical research are needed.

Agent-agnostic perspective is a very good innovation for thinking about these problems - is line between agentive and non-agentive behaviour is often not clear, and it's not like there is a principled metaphysical distinction between the two (e.g. Dennett and the Intentional Stance). Currently, big corporations can be weakly modelled this way and individual humans are fully agentive, but Transformative AI will bring up a whole spectrum of more and less agentive things that will fill up the rest of this spectrum.

 

There is a sense in which, if the outcome is something catastrophic, there must have been misalignment, and if there was misalignment then in some sense at least some individual agents were misaligned. Specifically, the systems in your Production Web weren't intent-aligned because they weren't doing what we wanted them to do, and were at least partly deceiving us. Assuming this is the case, 'multipolar failure' requires some subset of intent misalignment. But it's a special subset because it involves different kinds of failures to the ones we normally talk about.

It seems like you're identifying some dimensions of intent alignment as those most likely to be neglected because they're the hardest to catch, or because there will be economic incentives to ensure AI isn't aligned in that way, rather than saying that there some sense in which the transformative AI in the production web scenario is 'fully aligned' but still produces an existential catastrophe.

I think that the difference between your Production Web and Paul Christiano's subtle creeping Outer Alignment failure scenario is just semantic - you say that the AIs involved are aligned in some relevant sense while Christiano says they are misaligned.

The further question then becomes, how clear is the distinction between multiagent alignment and 'all of alignment except multiagent alignment'. This is the part where your claim of 'Problems before solutions' actually does become an issue - given that the systems going wrong in Production Web aren't Intent-aligned (I think you'd agree with this), at a high level the overall problem is the same in single and multiagent scenarios.

So for it to be clear that there is a separate multiagent problem to be solved, we have to have some reason to expect that the solutions currently intended to solve single agent intent alignment aren't adequate, and that extra research aimed at examining the behaviour of AI e.g. in game theoretic situations, or computational social choice research, is required to avert these particular examples of misalignment.

A related point - as with single agent misalignment, the Fast scenarios seem more certain to occur, given their preconditions, than the slow scenarios.

A certain amount of stupidity and lack of coordination persisting for a while is required in all the slow scenarios, like the systems involved in Production Web being allowed to proliferate and be used more and more even if an opportunity to coordinate and shut the systems down exists and there are reasons to do so. There isn't an exact historical analogy for that type of stupidity so far, though a few things come close (e.g. covid response, leadup to WW2, cuban missile crisis).

As with single agent fast takeoff scenarios, in the fast stories there is a key 'treacherous turn' moment where the systems suddenly go wrong, which requires much less lack of coordination to be plausible than the slow Production Web scenarios.

Therefore, multipolar failure is less dangerous if takeoff is slower, but the difference in risk between slow vs fast takeoff for multipolar failure is unfortunately a lot smaller than the slow vs fast risk difference for single agent failure (where the danger is minimal if takeoff is slow enough). So multiagent failures seem like they would be the dominant risk factor if takeoff is sufficiently slow.

Comment by Sammy Martin (SDM) on SDM's Shortform · 2021-03-30T00:04:30.952Z · LW · GW

Yes, its very oversimplified - in this case 'capability' just refers to whatever enables RSI, and we assume that it's a single dimension. Of course, it isn't, but we assume that the capability can be modelled this way as a very rough approximation.

Physical limits are another thing the model doesn't cover - you're right to point out that on the intelligence explosion/full RSI scenarios the graph goes vertical only for a time until some limit is hit

Comment by Sammy Martin (SDM) on SDM's Shortform · 2021-03-29T18:38:48.724Z · LW · GW

Update to 'Modelling Continuous Progress'

I made an attempt to model intelligence explosion dynamics in this post, by attempting to make the very oversimplified exponential-returns-to-exponentially-increasing-intelligence model used by Bostrom and Yudkowsky slightly less oversimplified.

This post tries to build on a simplified mathematical model of takeoff which was first put forward by Eliezer Yudkowsky and then refined by Bostrom in Superintelligence, modifying it to account for the different assumptions behind continuous, fast progress as opposed to discontinuous progress. As far as I can tell, few people have touched these sorts of simple models since the early 2010’s, and no-one has tried to formalize how newer notions of continuous takeoff fit into them. I find that it is surprisingly easy to accommodate continuous progress and that the results are intuitive and fit with what has already been said qualitatively about continuous progress.

The page includes python code for the model.

This post doesn't capture all the views of takeoff - in particular it doesn't capture the non-hyperbolic faster growth mode scenario, where marginal intelligence improvements are exponentially increasingly difficult, and therefore we get a (continuous or discontinuous switch to a) new exponential growth mode rather than runaway hyperbolic growth.

But I think that by modifying the f(I) function that determines how RSI capability varies with intelligence we can incorporate such views.

In the context of the exponential model given in the post that would correspond to an f(I) function where

which would result in a continuous (determined by size of d) switch to a single faster exponential growth mode

But I think the model still roughly captures the intuition behind scenarios that involve either a continuous or a discontinuous step to an intelligence explosion.

Given the model assumptions, we see how the different scenarios look in practice:

If we plot potential AI capability over time, we can see how no new growth mode (brown) vs a new growth mode (all the rest), the presence of an intelligence explosion (red and orange) vs not (green and purple), and the presence of a discontinuity (red and purple) vs not (orange and green) affect the takeoff trajectory.

Comment by Sammy Martin (SDM) on My research methodology · 2021-03-29T18:12:50.597Z · LW · GW

Is a bridge falling down the moment you finish building it an extreme and somewhat strange failure mode? In the space of all possible bridge designs, surely not. Most bridge designs fall over. But in the real world, you could win money all day betting that bridges won't collapse the moment they're finished.

I'm not saying this is an exact analogy for AGI alignment - there are lots of specific technical reasons to expect that alignment is not like bridge building and that there are reasons why the approaches we're likely to try will break on us suddenly in ways we can't fix as we go - treacherous turns, inner misalignment or reactions to distributional shift. It's just that there are different answers to the question of what's the default outcome depending on if you're asking what to expect abstractly or in the context of how things are in fact done.

 

Instrumental Convergence plus a specific potential failure mode (like e.g. we won't pay sufficient attention to out of distribution robustness), is like saying 'you know the vast majority of physically possible bridge designs fall over straight away and also there's a giant crack in that load-bearing concrete pillar over there' - if for some reason your colleague has a mental block around the idea that a bridge could in principle fall down then the first part is needed (hence why IC is important for presentations of AGI risk because lots of people have crazily wrong intuitions about the nature of AI or intelligence), but otherwise IC doesn't do much to help the case for expecting catastrophic misalignment and isn't enough to establish that failure is a default outcome.

 

It seems like your reason for saying that catastrophic misalignment can't be considered an abnormal or extreme failure mode comes down to this pre-technical-detail Instrumental Convergence thesis - that IC by itself gives us a significant reason to worry, even if we all agree that IC is not the whole story. 

this seems a bizarre way to describe something that we agree is the default result of optimizing for almost anything (eg paperclips).  

= 'because strongly optimizing for almost anything leads to catastrophe via IC, we can't call catastrophic misalignment a bizarre outcome'?

Maybe it's just a subtle difference in emphasis without a real difference in expectation/world model, but I think there is an important need to clarify the difference between 'IC alone raises an issue that might not be obvious but doesn't give us a strong reason to expect a catastrophe' and 'IC alone suggests a catastrophe even though it's not the whole story' - and the first of these is a more accurate way of viewing the role of IC in establishing the likelihood of catastrophic misalignment.

Ben Garfinkel argues for the first of these and against the second, in his objection to the 'classic' formulation of instrumental convergence/orthogonality - that these are just 'measure based' arguments which identify that a majority of possible AI designs with some agentive properties and large-scale goals will optimize in malign ways, rather than establishing that we're actually likely to build such agents.

Comment by Sammy Martin (SDM) on Mathematical Models of Progress? · 2021-02-16T15:47:42.268Z · LW · GW

I made an attempt to model intelligence explosion dynamics in this post, by attempting to make the very oversimplified exponential-returns-to-exponentially-increasing-intelligence model used by Bostrom and Yudkowsky slightly less oversimplified.

This post tries to build on a simplified mathematical model of takeoff which was first put forward by Eliezer Yudkowsky and then refined by Bostrom in Superintelligence, modifying it to account for the different assumptions behind continuous, fast progress as opposed to discontinuous progress. As far as I can tell, few people have touched these sorts of simple models since the early 2010’s, and no-one has tried to formalize how newer notions of continuous takeoff fit into them. I find that it is surprisingly easy to accommodate continuous progress and that the results are intuitive and fit with what has already been said qualitatively about continuous progress.

The page includes python code for the model.

This post doesn't capture all the views of takeoff - in particular it doesn't capture the non-hyperbolic faster growth mode scenario, where marginal intelligence improvements are exponentially increasingly difficult and therefore we get a (continuous or discontinuous switch to a) new exponential growth mode rather than runaway hyperbolic growth.

But I think that by modifying the f(I) function that determines how RSI capability varies with intelligence we can incorporate such views.

(In the context of the exponential model given in the post that would correspond to an f(I) function where 

which would result in a continuous (determined by size of d) switch to a single faster exponential growth mode)

But I think the model still roughly captures the intuition behind scenarios that involve either a continuous or a discontinuous step to an intelligence explosion.

Comment by Sammy Martin (SDM) on The Meaning That Immortality Gives to Life · 2021-02-16T12:21:47.180Z · LW · GW

Modern literature about immortality is written primarily by authors who expect to die, and their grapes are accordingly sour. 

This is still just as true as when this essay was written, I think - even the Culture had its human citizens mostly choosing to die after a time... to the extent that I eventually decided: if you want something done properly, do it yourself.

But there are exceptions - the best example of published popular fiction that has immortality as a basic fact of life is the Commonwealth Saga by Peter F Hamilton and the later Void Trilogy (the first couple of books were out in 2007).

The Commonwealth has effective immortality, a few downsides of it are even noticable (their culture and politics is a bit more stagnant than we might like), but there's never any doubt at all that it's worth it, and it's barely commented on in the story,

In truth, I suspect that if people were immortal, they would not think overmuch about the meaning that immortality gives to life. 

(Incidentally, the latter-day Void Trilogy Commonwealth is probably the closest a work of published fiction has come to depicting a true eudaimonic utopia that lacks the problems of the culture)

I wonder if there's been any harder to detect shift in how immortality is portrayed in fiction since 2007? Is it still as rare now as then to depict it as a bad thing?

Comment by Sammy Martin (SDM) on Covid 2/11: As Expected · 2021-02-12T12:05:38.568Z · LW · GW

The UK vaccine rollout is considered a success, and by the standards of other results, it is indeed a success. This interview explains how they did it, which was essentially ‘make deals with companies and pay them money in exchange for doses of vaccines.’

A piece of this story you may find interesting (as an example of a government minister making a decision based on object level physical considerations): multiple reports say Matt Hancock, the UK's health Secretary, made the decision to insist on over-ordering vaccines because he saw the movie Contagion and was shocked into viscerally realising how important a speedy rollout was.

https://www.economist.com/britain/2021/02/06/after-a-shaky-start-matt-hancock-has-got-the-big-calls-right

It might just be a nice piece of PR, but even if that's the case it's still a good metaphor for how object level physical considerations can intrude into government decision making

Comment by Sammy Martin (SDM) on Review of Soft Takeoff Can Still Lead to DSA · 2021-02-06T16:08:26.501Z · LW · GW

I agree with your argument about likelihood of DSA being higher compared to previous accelerations, due to society not being able to speed up as fast as the technology. This is sorta what I had in mind with my original argument for DSA; I was thinking that leaks/spying/etc. would not speed up nearly as fast as the relevant AI tech speeds up.

Your post on 'against GDP as a metric' argues more forcefully for the same thing that I was arguing for, that 

'the economic doubling time' stops being so meaningful - technological progress speeds up abruptly but other kinds of progress that adapt to tech progress have more of a lag before the increased technological progress also affects them? 

So we're on the same page there that it's not likely that 'the economic doubling time' captures everything that's going on all that well, which leads to another problem - how do we predict what level of capability is necessary for a transformative AI to obtain a DSA (or reach the PONR for a DSA)?

I notice that in your post you don't propose an alternative metric to GDP, which is fair enough since most of your arguments seem to lead to the conclusion that it's almost impossibly difficult to predict in advance what level of advantage over the rest of the world in which areas are actually needed to conquer the world, since we seem to be able to analogize persuasion tools to or conquistador-analogues who had relatively small tech advantages, to the AGI situation.

I think that there is still a useful role for raw economic power measurements, in that they provide a sort of upper bound on how much capability difference is needed to conquer the world. If an AGI acquires resources equivalent to controlling >50% of the world's entire GDP, it can probably take over the world if it goes for the maximally brute force approach of just using direct military force. Presumably the PONR for that situation would be awhile before then, but at least we know that an advantage of a certain size would be big enough given no assumptions about the effectiveness of unproven technologies of persuasion or manipulation or specific vulnerabilities in human civilization.

So we can use our estimate of how doubling time may increase, anchor on that gap and estimate down based on how soon we think the PONR is, or how many 'cheat' pathways that don't involve economic growth there are.

The whole idea of using brute economic advantage as an upper limit 'anchor' I got from Ajeya's Post about using biological anchors to forecast what's required for TAI - if we could find a reasonable lower bound for the amount of advantage needed to attain DSA we could do the same kind of estimated distribution between them. We would just need a lower limit - maybe there's a way of estimating it based on the upper limit of human ability since we know no actually existing human has used persuasion to take over the world but as you point out they've come relatively close.

I realize that's not a great method, but is there any better alternative given that this is a situation we've never encountered before, for trying to predict what level of capability is necessary for DSA? Or perhaps you just think that anchoring your prior estimate based on economic power advantage as an upper bound is so misleading it's worse than having a completely ignorant prior. In that case, we might have to say that there are just so many unprecedented ways that a transformative AI could obtain a DSA that we can just have no idea in advance what capability is needed, which doesn't feel quite right to me.

Comment by Sammy Martin (SDM) on Ten Causes of Mazedom · 2021-01-18T13:35:19.544Z · LW · GW

Finally got round to reading your sequence and it looks like we disagree a lot less than I thought, since your first three causes are exactly what I was arguing for in my reply,

This is probably the crux. I don't think we tend to go to higher simulacra levels now, compared to decades ago. I think it's always been quite prevalent, and has been roughly constant through history. While signalling explanations definitely tell us a lot about particular failings, they can't explain the reason things are worse now in certain ways, compared to before. The difference isn't because of the perennial problem of pervasive signalling. It has more to do with economic stagnation and not enough state capacity. These flaws mean useful action gets replaced by useless action, and allow more room for wasteful signalling.

As one point in favour of this model, I think it's worth noting that the historical comparisons aren't ever to us actually succeeding at dealing with pandemics in the past, but to things like "WWII-style" efforts - i.e. thinking that if we could just do x as well as we once did y then things would have been a lot better.

This implies that if you made an institution analogous to e.g. the weapons researchers of WW2 and the governments that funded them, or NASA in the 1960s, without copy-pasting 1940s/1960s society wholesale, the outcome would have been better. To me that suggests it's institution design that's the culprit, not this more ethereal value drift or increase in overall simulacra levels.

I think you'd agree with most of that, except that you see a much more significant causal role for the cultural factors like increased fragility and social atomisation. There is pretty solid evidence for both being real problems, Jon Haidt presents the best case to take these seriously, although it's not as definitive as you make out (E.g. Suicide rates are basically a random walk), and your explanation for how they lead to institutional problems is reasonable, but I wonder if they are even needed as explanations when your first three causes are so strong and obvious,

Essentially I see your big list like this:

Main Drivers:

Cause 1: More Real Need For Large Organizations (includes decreasing low hanging fruit) Cause 2: Laws and Regulations Favor Large Organizations Cause 3: Less Disruption of Existing Organizations Cause 5: Rent Seeking is More Widespread and Seen as Legitimate

Real but more minor:

Cause 4: Increased Demand for Illusion of Safety and Security Cause 8: Atomization and the Delegitimization of Human Social Needs Cause 7: Ignorance Cause 9: Educational System Cause 10: Vicious Cycle

No idea but should look into:

Cause 6: Big Data, Machine Learning and Internet Economics

Essentially my view is that if you directly addressed the main drivers with large legal or institutional changes the other causes of mazedom wouldn't fight back.

I believe that the 'obvious legible institutional risks first' view is in line with what others who've written on this problem like Tyler Cowen or Sam Bowman think, but it's a fairly minor disagreement since most of your proposed fixes are on the institutional side of things anyway.

Also, the preface is very important - these are some of the only trends that seem to be going the wrong way consistently in developed countries for a while now, and they're exactly the forces you'd expect to be hardest to resist.

The world is better for people than it was back then. There are many things that have improved. This is not one of them.

Comment by Sammy Martin (SDM) on Review of Soft Takeoff Can Still Lead to DSA · 2021-01-10T20:29:34.382Z · LW · GW

Currently the most plausible doom scenario in my mind is maybe a version of Paul’s Type II failure. (If this is surprising to you, reread it while asking yourself what terms like “correlated automation failure” are euphemisms for.) 

This is interesting, and I'd like to see you expand on this. Incidentally I agree with the statement, but I can imagine both more and less explosive, catastrophic versions of 'correlated automation failure'. On the one hand it makes me think of things like transportation and electricity going haywire, on the other it could fit a scenario where a collection of powerful AI systems simultaneously intentionally wipe out humanity.

Clock-time leads shrink automatically as the pace of innovation speeds up, because if everyone is innovating 10x faster, then you need 10x as many hoarded ideas to have an N-year lead. 

What if, as a general fact, some kinds of progress (the technological kinds more closely correlated with AI) are just much more susceptible to speed-up? I.e, what if 'the economic doubling time' stops being so meaningful - technological progress speeds up abruptly but other kinds of progress that adapt to tech progress have more of a lag before the increased technological progress also affects them? In that case, if the parts of overall progress that affect the likelihood of leaks, theft and spying aren't sped up by as much as the rate of actual technology progress, the likelihood of DSA could rise to be quite high compared to previous accelerations where the order of magnitude where the speed-up occurred was fast enough to allow society to 'speed up' the same way.

In other words - it becomes easier to hoard more and more ideas if the ability to hoard ideas is roughly constant but the pace of progress increases. Since a lot of these 'technologies' for facilitating leaks and spying are more in the social realm, this seems plausible.

But if you need to generate more ideas, this might just mean that if you have a very large initial lead, you can turn it into a DSA, which you still seem to agree with:

  • Even if takeoff takes several years it could be unevenly distributed such that (for example) 30% of the strategically relevant research progress happens in a single corporation. I think 30% of the strategically relevant research happening in a single corporation at beginning of a multi-year takeoff would probably be enough for DSA.
Comment by Sammy Martin (SDM) on Fourth Wave Covid Toy Modeling · 2021-01-10T10:37:49.241Z · LW · GW

I meant, 'based on what you've said about Zvi's model' I.e. Nostalgebraist says zvi says Rt never goes below 1 - if you look at the plot he produced Rt is always above 1 given Zvi's assumptions, which the London data falsified.

Comment by Sammy Martin (SDM) on Fourth Wave Covid Toy Modeling · 2021-01-09T19:33:11.765Z · LW · GW
  • It seems better to first propose a model we know can match past data, and then add a tuning term/effect for "pandemic fatigue" for future prediction.

To get a sense of scale, here is one of the plots from my notebook:

https://64.media.tumblr.com/823e3a2f55bd8d1edb385be17cd546c7/673bfeb02b591235-2b/s640x960/64515d7016eeb578e6d9c45020ce1722cbb6af59.png

The colored points show historical data on R vs. the 6-period average, with color indicating the date.

Thanks for actually plotting historical Rt vs infection rates!

Whereas, it seems more natural to take (3) as evidence that (1) was wrong.

In my own comment, I also identified the control system model of any kind of proportionality of Rt to infections as a problem. Based on my own observations of behaviour and government response, the MNM hypothesis seems more likely (governments hitting the panic button as imminent death approaches, i.e. hospitals begin to be overwhelmed) than a response that ramps up proportionate to recent infections. I think that explains the tight oscillations.

I'd say the dominant contributor to control systems is something like a step function at a particular level near where hospitals are overwhelmed, and individual responses proportionate to exact levels of infection are a lesser part of it.

You could maybe operationalize this by looking at past hospitalization rates, fitting a logistic curve to them at the 'overwhelmed' threshold and seeing if that predicts Rt. I think it would do pretty well.

This tight control was a surprise and is hard to reproduce in a model, but if our model doesn't reproduce it, we will go on being surprised by the same thing that surprised us before.

My own predictions are essentially based on continuing to expect the 'tight control' to continue somehow, i.e. flattening out cases or declining a bit at a very high level after a large swing upwards.

It looks like (subsequent couple of days data seem to confirm this), Rt is currently just below 1 in London - which would outright falsify any model that claims Rt never goes below 1 for any amount of infection with the new variant, given our control system response, which according to your graph, the infections exponential model does predict.

If you ran this model on the past, what would it predict? Based on what you've said, Rt never goes below one, so there would be a huge first wave with a rapid rise up to partial herd immunity over weeks, based on your diagram. That's the exact same predictive error that was made last year.

I note - outside view - that this is very similar to the predictive mistake made last Febuary/March with old Covid-19 - many around here were practically certain we were bound for an immediate (in a month or two) enormous herd immunity overshoot.

Comment by Sammy Martin (SDM) on Eight claims about multi-agent AGI safety · 2021-01-07T19:48:32.210Z · LW · GW

Humans have skills and motivations (such as deception, manipulation and power-hungriness) which would be dangerous in AGIs. It seems plausible that the development of many of these traits was driven by competition with other humans, and that AGIs trained to answer questions or do other limited-scope tasks would be safer and less goal-directed. I briefly make this argument here.

Note that he claims that this may be true even if single/single alignment is solved, and all AGIs involved are aligned to their respective users.

It strikes me as interesting that much of the existing work that's been done on multiagent training, such as it is, focusses on just examining the behaviour of artificial agents in social dilemmas. The thinking seems to be - and this was also suggested in ARCHES - that it's useful just for exploratory purposes to try to characterise how and whether RL agents cooperate in social dilemmas, what mechanism designs and what agent designs promote what types of cooperation, and if there are any general trends in terms of what kinds of multiagent failures RL tends to fall into.

For example, it's generally known that regular RL tends to fail to cooperate in social dilemmas, 'Unfortunately, selfish MARL agents typically fail when faced with social dilemmas'. From ARCHES:

One approach to this research area is to continually ex-amine social dilemmas through the lens of whatever is the leading AI devel-opment paradigm in a given year or decade, and attempt to classify interest-ing behaviors as they emerge. This approach might be viewed as analogous to developing “transparency for multi-agent systems”: first develop inter-esting multi-agent systems, and then try to understand them.

There seems to be an implicit assumption here that something very important and unique to multiagent situations would be uncovered - by analogy to things like the flash crash. It's not clear to me that we've examined the intersection of RL and social dilemmas enough to notice if this were true, if it were true, and I think that's the major justification for working on this area.

Comment by Sammy Martin (SDM) on Fourth Wave Covid Toy Modeling · 2021-01-07T14:11:35.278Z · LW · GW

One thing that you didn't account for - the method of directly scaling the Rt by the multiple on the R0 (which seems to be around 1.55), is only a rough estimate of how much the Rt will increase by when the effective Rt is lowered in a particular situation. It could be almost arbitrarily wrong - intuitively, if the hairdressers are closed, that prevents 100% of transmission in hairdressers no matter how much higher the R0 of the virus is.

For this reason, the actual epidemiological models (there aren't any for the US for the new variant, only some for the UK), have some more complicated way of predicting the effect of control measures. This from Imperial College:

We quantified the transmission advantage of the VOC relative to non-VOC lineages in twoways: as an additive increase in R that ranged between 0.4 and 0.7, and alternatively as amultiplicative increase in R that ranged between a 50% and 75% advantage. We were not ableto distinguish between these two approaches in goodness-of-fit, and either is plausiblemechanistically. A multiplicative transmission advantage would be expected if transmissibilityhad increased in all settings and individuals, while an additive advantage might reflect increasesin transmissibility in specific subpopulations or contexts.

The multiplicative 'increased transmissibility' estimate will therefore tend to underestimate the effect of control measures. The actual paper did some complicated Bayesian regression to try and figure out which model of Rt change worked best, and couldn't figure it out.

Measures like ventilation, physical distancing when you do decide to meet up, and mask use will be more multiplicative in how the new variant diminishes their effect. The parts of the behaviour response that involve people just not deciding to meet up or do things in the first place, and anything involving mandatory closures of schools, bars etc. will be less multiplicative.

 

I believe this is borne out in the early data. Lockdown 1 in the UK took Rt down to 0.6. The naive 'multiplicative' estimate would say that's sufficient for the new variant, Rt=0.93. The second lockdown took Rt down to 0.8, which would be totally insufficient. You'd need Rt for the old variant of covid down to 0.64 on the naive multiplicative estimate - almost what was achieved in March. I have a hard time believing it was anywhere near that low in the Tier 4 regions around Christmas.

But the data that's come in so far seems to indicate that Tier 4 + Schools closed has either levelled off or caused slow declines in infections in those regions where they were applied.

First, the random infection survey - London and South East are in decline and East of England has levelled off (page 3). The UKs symptom study, which uses a totally different methodology, confirms some levelling off and declines in those regions - page 6. It's early days, but clearly Rt is very near 1, and likely below 1 in London. The Financial Times cottoned on to this a few days late but no-one else seems to have noticed.

I think this indicates a bunch of things - mainly that infections caused by the new variant can and will be stabilized or even reduced by lockdown measures which people are willing to obey. It's not impossible if it's already happening.

 

To start, let’s also ignore phase shifts like overloading hospitals, and ignore fatigue on the hopes that vaccines coming soon will cancel it out, although there’s an argument that in practice some people do the opposite.

I agree with ignoring fatigue, but ignoring phase shifts? If it were me I'd model the entire control system response as a phase shift with the level for the switch in reactions set near the hospital overwhelm level - at least on the policy side, there seems to be an abrupt reaction specifically to the hospital overloading question. The British government pushed the panic button a few days ago in response to that and called a full national lockdown. I'd say the dominant contributor to control systems is something like a step function at a particular level near where hospitals are overwhelmed, and individual responses proportionate to exact levels of infection are a lesser part of it.

I think the model of the control system as a continuous response is wrong, and a phased all-or-nothing response for the government side of things, plus taking into account non-multiplicative effects on the Rt, would produce overall very different results - namely that a colossal overshoot of herd immunity in a mere few weeks is probably not happening. I note - outside view - that this is very similar to the predictive mistake made last Febuary/March with old Covid-19 - many around here were practically certain we were bound for an immediate (in a month or two) enormous herd immunity overshoot.

Comment by Sammy Martin (SDM) on Covid 12/31: Meet the New Year · 2021-01-05T11:40:03.554Z · LW · GW

Many of the same thoughts were in my mind when I linked when I linked that study on the previous post.

----

IMO, it would help clarify arguments about the "control system" a lot to write down the ideas in some quantitative form.

...

This tells you nothing about the maximum power of my heating system.  In colder temperatures, it'd need to work harder, and at some low enough temperature T, it wouldn't be able to sustain 70F inside.  But we can't tell what that cutoff T is until we reach it.  "The indoor temperature right now oscillates around 70F" doesn't tell you anything about T.

I agree, and in fact the main point I was getting at with my initial comment is that in the two areas I talked about - namely the control system and the overall explanation for failure, there's an unfortunate tendency to toss out quantitative arguments or even detailed models of the world and instead resort to intuitions and qualitative arguments - and then it has a tendency to turn into a referendum on your personal opinions about human nature and the human condition, which isn't that useful for predicting anything. You can see this in how the predictions panned out - as was pointed out by some anonymous commenter, control system 'running out of power' arguments generally haven't been that predictively accurate when it comes to these questions.

The rule-of-thumb that I've used - the Morituri Nolumus Mori effect - has fared somewhat better than the 'control system will run out of steam sooner or later' rule-of-thumb, both when I wrote that post and since. The MNM tends to predict last-minute myopic decisions that mostly avoid the worst outcomes, while the 'out of steam' explanation led people to predict that social distancing would mostly be over by now. But neither is a proper quantitative model.

In terms of actually giving some quantitative rigour to this question - it's not easy. I made an effort in my old post, by saying how far a society can stray from a control system equilibrium is indicated by how low they managed to get Rt - but the 'gold standard' is to just work off model projections trained on already existing data like I tried to do.

 

As to the second question - overall explanation, there is some data to work off of, but not much. We know that preexisting measures of state capacity don't predict covid response effectiveness, which along with other evidence suggests the 'institutional schlerosis' hypothesis I referred to in my original post. Once again, I think that a clear mechanism - 'institutional sclerosis as part of the great stagnation' - is a much better starting point for unravelling all this than the 'simulacra levels are higher now' perspective that I see a lot around here. That claim is too abstract to easily falsify or derive genuine in-advance predictions.

Comment by Sammy Martin (SDM) on Covid 12/31: Meet the New Year · 2021-01-01T14:07:59.917Z · LW · GW

I live in Southern England and so have a fair bit of personal investment in all this, but I'll try to be objective. My first reaction, upon reading the LSHTM paper that you referred to, is 'we can no longer win, but we can lose less' - i.e. we are all headed for herd immunity one way or another by mid-year, but we can still do a lot to protect people. That would have been my headline - it's over for suppression and elimination, but 'it's over' isn't quite right. Your initial reaction was different:

Are We F***ed? Is it Over?

Yeah, probably. Sure looks like it.

The twin central points last were that we were probably facing a much more infectious strain (70%), and that if we are fucked in this way, then it is effectively already over in the sense that our prevention efforts would be in vain.

The baseline scenario remains, in my mind, that the variant takes over some time in early spring, the control system kicks in as a function of hospitalizations and deaths so with several weeks lag, and likely it runs out of power before it stabilizes things at all, and we blow past herd immunity relatively quickly combining that with our vaccination efforts.

You give multiple reasons to expect this, all of which make complete sense - Lockdown fatigue, the inefficiency of prevention, lags in control systems, control systems can't compensate etc. I could give similar reasons to expect the alternative - mainly that the MNM predicts the extreme strength of control systems and that it looks like many places in Europe/Australia did take Rt down to 0.6 or even below!

But luckily, none of that is necessary.

This preprint model via the LessWrong thread has a confidence interval for increased infectiousness of 50%-74%.

I would encourage everyone to look at the scenarios in this paper since they neatly explain exactly what we're facing and mean we don't have to rely on guestimate models and inference about behaviour changes. This model is likely highly robust - it successfully predicted the course of the UK's previous lockdown, with whatever compliance we had then. They simply updated it by putting in the increased infectiousness of the new variant. Since that last lockdown was very recent, compliance isn't going to be wildly different, weather was cold during the previous lockdown, schools were open etc. The estimate for the increase in R given in this paper seems to be the same as that given by other groups e.g. Imperial College.

So what does the paper imply? Essentially a Level 4 lockdown (median estimate) flattens out case growth but with schools closed a L4 lockdown causes cases to decline a bit (page 10). 10x-ing the vaccination rate from 200,000 to 2 million reduces the overall numbers of deaths by more than half (page 11). And they only model a one-month lockdown, but that still makes a significant difference to overall deaths (page 11). We managed 500k vaccinations the first week, and it dropped a bit the second week, but with first-doses first and the Oxford/AZ vaccine it should increase again and land somewhere between those two scenarios. Who knows where? For the US, the fundamental situation may look like the first model - no lockdowns at all, so have a look.

(Also of note is that the peak demand on the medical system even in the bad scenarios with a level 4 lockdown and schools open is less than 1.5x what was seen during the first peak. That's certainly enough to boost the IFR and could be described as 'healthcare system collapse', since it means surge capacity being used, healthcare workers being wildly overstretched, but to my mind 'collapse' refers to demand that exceeds supply by many multiples such that most people can't get any proper care at all - as was talked about in late feb/early march.)

(Edit: the level of accuracy of the LSHTM model should become clear in a week or two)

The nature of our situation now is such that every day of delay and every extra vaccinated person makes us incrementally better off.

This is a simpler situation than before - before we had the option of suppression, which is all-or-nothing - either you get R under 1 or you don't. The race condition that we're in now, where short lockdowns that temporarily hold off the virus buy us useful time, and speeding up vaccination increases herd immunity and decreases deaths and slackens the burden on the medical system, is a straightforward fight by comparison. You just do whatever you can to beat it back and vaccinate as fast as you can.

Now, I don't think you really disagree with me here, except about some minor factual details (I reckon your pre-existing intuitions about what 'Level 4 lockdown' would be capable of doing are different to mine), and you mention the extreme urgency of speeding up vaccine rollout often,

We also have a vaccination crisis. WIth the new strain coming, getting as many people vaccinated as fast as possible becomes that much more important.

...

With the more reasonable version of this being “we really really really should do everything to speed up our vaccinations, everyone, and to focus them on those most likely to die of Covid-19.” That’s certainly part of the correct answer, and likely the most important one for us as a group.

But if I were writing this, my loud headline message would not have been 'It's over', because none of this is over, many decisions still matter. It's only 'over' for the possibility of long term suppression.

*****

There's also the much broader point - the 'what, precisely, is wrong with us' question. This is very interesting and complex and deserves a long discussion of its own. I might write one at some point. I'm just giving some initial thoughts here, partly a very delayed response to your reply to me 2 weeks ago (https://www.lesswrong.com/posts/Rvzdi8RS9Bda5aLt2/covid-12-17-the-first-dose?commentId=QvYbhxS2DL4GDB6hF). I think we have a hard-to-place disagreement about some of the ultimate causes of our coronavirus failures.

We got a shout-out in Shtetl-Optimized, as he offers his “crackpot theory” that if we were a functional civilization we might have acted like one and vaccinated everyone a while ago

...

I think almost everyone on earth could have, and should have, already been vaccinated by now. I think a faster, “WWII-style” approach would’ve saved millions of lives, prevented economic destruction, and carried negligible risks compared to its benefits. I think this will be clear to future generations, who’ll write PhD theses exploring how it was possible that we invented multiple effective covid vaccines in mere days or weeks

He's totally right on the facts, of course. The question is what to blame. I think our disagreement here, as revealed in our last discussion, is interesting. The first order answer is institutional sclerosis, inability to properly do expected value reasoning and respond rapidly to new evidence. We all agree on that and all see the problem. You said to me,

And I agree that if government is determined to prevent useful private action (e.g. "We have 2020 values")...

Implying, as you've said elsewhere, that the malaise has a deeper source. When I said "2020 values" I referred to our overall greater valuation of human life, while you took it to refer to our tendency to interfere with private action - something you clearly think is deeply connected to the values we (individuals and governments) hold today.

I see a long term shift towards a greater valuation of life that has been mostly positive, and some other cause producing a terrible outcome from coronavirus in western countries, and you see a value shift towards higher S levels that has caused the bad outcomes from coronavirus and other bad things.

Unlike Robin Hanson, though, you aren't recommending we attempt to tell people to go off and have different values - you're simply noting that you think our tendency to make larger sacrifices is a mistake.

"...even when the trade-offs are similar, which ties into my view that simulacra and maze levels are higher, with a larger role played by fear of motive ambiguity."

This is probably the crux. I don't think we tend to go to higher simulacra levels now, compared to decades ago. I think it's always been quite prevalent, and has been roughly constant through history. While signalling explanations definitely tell us a lot about particular failings, they can't explain the reason things are worse now in certain ways, compared to before. The difference isn't because of the perennial problem of pervasive signalling. It has more to do with economic stagnation and not enough state capacity. These flaws mean useful action gets replaced by useless action, and allow more room for wasteful signalling.

As one point in favour of this model, I think it's worth noting that the historical comparisons aren't ever to us actually succeeding at dealing with pandemics in the past, but to things like "WWII-style" efforts - i.e. thinking that if we could just do x as well as we once did y then things would have been a lot better.

This implies that if you made an institution analogous to e.g. the weapons researchers of WW2 and the governments that funded them, or NASA in the 1960s, without copy-pasting 1940s/1960s society wholesale, the outcome would have been better. To me that suggests it's institution design that's the culprit, not this more ethereal value drift or increase in overall simulacra levels. There are other independent reasons to think the value shift has been mostly good, ones I talked about in my last post.

As a corollary, I also think that your mistaken predictions in the past - that we'd give up on suppression or that the control system would fizzle out, are related to this. If you think we operate at higher S levels than in the past, you'd be more inclined to think we'll sooner or later sleepwalk into a disaster. If you think there is a strong, consistent, S1 drag away from disaster, as I argued way back here, you'd expect strong control system effects that seem surprisingly immune to 'fatigue'.

Comment by Sammy Martin (SDM) on New SARS-CoV-2 variant · 2020-12-22T00:00:04.400Z · LW · GW

Update: this from public health England explicitly says Rt increases by 0.57, https://twitter.com/DevanSinha/status/1341132723105230848?s=20

"We find that Rt increases by 0.57 [95%CI: 0.25-1.25] when we use a fixed effect model for each area. Using a random effect model for each area gives an estimated additive effect of 0.74 [95%CI: 0.44- 1.29].

an area with an Rt of 0.8 without the new variant would have an Rt of 1.32 [95%CI:1.19-1.50] if only the VOC was present."

But for R, if it's 0.6 not 0.8 and the ratio is fixed then another march style lockdown in the UK would give R = 0.6 *(1.32/0.8)= 0.99

Comment by Sammy Martin (SDM) on New SARS-CoV-2 variant · 2020-12-21T20:54:51.473Z · LW · GW

EDIT: doubling time would go from 17 days to 4 days (!) with the above change of numbers. This doesn't fit given what is currently observed.

The doubling time for the new strain does appear to be around 6-7 days. And the doubling time for London overall is currently 6 days.

If the mitigated Rt is +0.66 and the growth rate is +71% figures are inconsistent with each other as you say, then perhaps the second is mistaken and +71% means that the Rt is 71% higher, not the case growth rate, which is vaguely consistent with the Rt is +58% higher estimate from the absolute increase. Or "71% higher daily growth rate" could be right and the +0.66 could be referring to the R0, as you say.

This does appear to have been summarized as 'the new strain is 71% more infectious' in many places, and many people have apparently inferred the R0 is >50% higher - hopefully we're wrong.

Computer modelling of the viral spread suggests the new variant could be 70 per cent more transmissible. The modelling shows it may raise the R value of the virus — the average number of people to whom someone with Covid-19 passes the infection — by at least 0.4,

I think this is what happens when people don't show their work.

So either 'R number' is actually referring to R0 and not Rt, or 'growth rate' isn't referring to the daily growth rate but to the Rt/R0. I agree that the first is more plausible. All I'll say is that a lot of people are assuming the 70% figure or something close to it is a direct multiplier to the Rt, including major news organizations like the Times and Ft. But I think you're probably right and the R0 is more like 15% larger not 58/70% higher.

EDIT: New info from PHE seems to contradict this, https://t.co/r6GOyXFDjh?amp=1

Comment by Sammy Martin (SDM) on New SARS-CoV-2 variant · 2020-12-21T20:22:49.908Z · LW · GW

EDIT: PHE has seemingly confirmed the higher estimate for change in R, ~65%. https://t.co/r6GOyXFDjh?amp=1

What, uh, does the "71% higher growth rate" mean

TLDR: I think that it's probably barely 15% more infectious and the math of spread near equilibrium amplifies things.

I admit that I have not read all available  documents in detail, but I presume that what they said means something like "if ancestor has a doubling time of X, then variant is estimated as having a doubling time of X/(1+0.71) = 0.58X"

In the meeting minutes, the R-value (Rt) was estimated to have increased by 0.39 to 0.93, the central estimate being +0.66 - 'an absolute increase in the R-value of between 0.39 to 0.93'. Then we see 'the growth rate is 71% higher than other variants'. You're right that this is referring to the case growth rate - they're saying the daily increase is 1.71 times higher, possibly?

I'm going to estimate the relative difference in Rt of the 2 strains from the absolute difference they provided - the relative difference in Rt (Rt(new covid now)/Rt(old covid now)) in the same region, should, I think, be the factor that tells us how more infectious the new strain is.

We need to know what the pre-existing, current, Rt of just the old strain of covid-19 is. Current central estimate for covid in the UK overall is 1.15. This guess was that the 'old covid' Rt was 1.13.

(0.66+1.13)/1.13 = 1.79 (Rt of new covid now)/1.13(Rt of old covid now) = 1.58, which implies that the Rt of the new covid is currently 58% higher than the old, which should be a constant factor, unless I'm missing something fundamental.  (For what it's worth, the Rt in london where the new strain makes up the majority of cases is close to that 1.79) value). So, the Rt and the R0 of the new covid is 58% higher - that would make the R0 somewhere around 4.5-5.

Something like that rough conclusion was also reached e.g. here or here or here or here or here, with discussion of 'what if the R0 was over 5' or '70% more infectious' or 'Western-style lockdown will not suppress' (though may be confusing the daily growth rate with the R0). This estimate from different data said the Rt was 1.66/1.13 = 47% higher which is close-ish to the 58% estimate.

I may have made a mistake somewhere here, and those sources have made the same mistake, but this seems inconsistent with your estimate that the new covid is 15% more infectious, i.e. the Rt and R0 is 15% higher not 58% higher.

This seems like a hugely consequential question. If the Rt of the new strain is more than ~66% larger than the Rt of the old strain, then March-style lockdowns which reduced Rt to 0.6 will not work, and the covid endgame will turn into a bloody managed retreat, to delay the spread and flatten the curve for as long as possible while we try to vaccinate as many people as possible. Of course, we should just go faster regardless:

Second, we do have vaccines and so in any plausible model faster viral spread implies a faster timetable for vaccine approval and distribution.  And it implies we should have been faster to begin with. If you used to say “we were just slow enough,” you now have to revise that opinion and believe that greater speed is called for, both prospectively and looking backwards. In any plausible model.

If you are right then this is just a minor step up in difficulty.

Tom Chivers agrees with you, that this is an 'amber light', metaculus seems undecided (probability of UK 2nd wave worse than 1st; increased by 20% to 42% when this news appeared), some of the forecasters seem to agree with you or be uncertain.

Comment by Sammy Martin (SDM) on Covid 12/17: The First Dose · 2020-12-18T19:19:39.528Z · LW · GW

On the economic front, we would have had to choose either to actually suppress the virus, in which case we get much better outcomes all around, or to accept that the virus couldn’t be stopped, *which also produces better economic outcomes. *

Our technological advancement gave us the choice to make massively larger Sacrifices to the Gods rather than deal with the situation. And as we all know, choices are bad. We also are, in my model, much more inclined to make such sacrifices now than we were in the past,

So, by 'Sacrifices to the Gods' I assume you're referring to the entirety of our suppression spending - because it's not all been wasted money, even if a large part of it has. In other places you use that phrase to refer specifically to ineffective preventative measures.

'We also are, in my model, much more inclined to make such sacrifices now than we were in the past '- this is a very important point that I'm glad you recognise - there has been a shift in values such that we (as individuals, as well as governments) are guaranteed to take the option of attempting to avoid getting the virus and sacrificing the economy to a greater degree than in 1919, or 1350, because our society values human life and safety differently.

And realistically, if we'd approached this with pre-2020 values and pre-2020 technology, we'd have 'chosen' to let the disease spread and suffered a great deal of death and destruction - but that option is no longer open to us. For better, as I think, or for worse, as you think.

You can do the abstract a cost-benefit calculation about whether the other harms of the disease have caused more damage than the disease, but it won't tell you anything about whether the act of getting governments to stop lockdowns and suppression measures will be better or worse than having them to try. Robin Hanson directly confuses these two in his argument that we are over-preventing covid.

We see variations in both kinds of policy across space and time, due both to private and government choices, all of which seem modestly influenceable by intellectuals like Caplan, Cowen, and I...

But we should also consider the very real possibility that the political and policy worlds aren’t very capable of listening to our advice about which particular policies are more effective than others. They may well mostly just hear us say “more” or “less”, such as seems to happen in medical and education spending debates.

Here Hanson is equivocating between (correctly) identifying the entire cost of COVID-19 prevention as due to 'both private and government choices' and then focussing on just 'the political and policy worlds' in response to whether we should argue for less prevention. The claim (which may or may not be true) that 'we overall are over-preventing covid relative to the abstract alternative where we don't' gets equated to 'therefore telling people to overall reduce spending on covid prevention will be beneficial on cost-benefit terms'.

Telling governments to spend less money is much more likely to work than ordering people to have different values. So making governments spend less on covid prevention diminishes their more effective preventative actions while doing very little about the source of most of the covid prevention spending (individual action).

Like-for-like comparisons where values are similar but policy is different (like Sweden and its neighbours), make it clear that given the underlying values we have, which lead to the behaviours that we have observed this year, the imperative 'prevent covid less' leads to outcomes that are across the board worse.

Or consider Sweden, which had a relatively non-panicky Covid messaging, no matter what you think of their substantive policies.  Sweden didn’t do any better on the gdp front, and the country had pretty typical adverse mobility reactions.  (NB: These are the data that you don’t see the “overreaction” critics engage with — at all.  And there is more where this came from.)

How about Brazil? While they did some local lockdowns, they have a denialist president, a weak overall response, and a population used to a high degree of risk.  The country still saw a gdp plunge and lots of collateral damage.  You might ponder this graph, causality is tricky and the “at what margin” question is trickier yet, but it certainly does not support what Bryan is claiming about the relevant trade-offs.

So, with the firm understanding that given the values we have, and the behaviour patterns we will inevitably adopt, telling people to prevent the pandemic less is worse economically and worse in terms of deaths, we can then ask the further, more abstract question that you ask - what if our values were different? That is, what if the option was available to us because we were actually capable of letting the virus rip.

I wanted to put that disclaimer in because discussing whether we have developed the right societal values is irrelevant for policy decisions going forward - but still important for other reasons. I'd be quite concerned if our value drift over the last century or so was revealed as overall maladapted, but it's important to talk about the fact that this is the question that's at stake when we ask if society is over-preventing covid. I am not asking whether lockdowns or suppression are worth it now - they are.

You seem to think that our values should be different; that it's at least plausible that signalling is leading us astray and causing us to overvalue the direct damage of covid, like lives lost, in place of concern for overall damage. Unlike Robin Hanson, though, you aren't recommending we attempt to tell people to go off and have different values - you're simply noting that you think our tendency to make larger sacrifices is a mistake.

...even when the trade-offs are similar, which ties into my view that simulacra and maze levels are higher, with a larger role played by fear of motive ambiguity. We might have been willing to do challenge trials or other actual experiments, and have had a much better handle on things quicker on many levels.

There are two issues here - one is that it's not at all clear whether the initial cost-benefit calculation about over-prevention is even correct. You don't claim to know if we are over-preventing in this abstract sense (compared to us having different values and individually not avoiding catching the disease), and the evidence that we are over-preventing comes from a twitter poll of Bryan Caplan's extremely libertarian-inclined followers who he told to try as hard as possible to be objective in assessing pandemic costs because he asked them what 'the average American' would value (Come on!!). Tyler Cowen briefly alludes to how woolly the numbers are here, 'I don’t agree with Bryan’s numbers, but the more important point is one of logic'.

The second issue is whether our change in values is an aberration caused by runaway signalling or reflects a legitimate, correct valuation of human life. Now, the fact that a lot of our prevention spending has been wasteful counts in favour of the signalling explanation, but on the other hand there's a ton of evidence that we in the past, in general, valued life too little. [There's also the point that this seems like exactly a case where a signalling explanation is hard to falsify, an issue I talked about here,

I worry that there is a tendency to adopt self-justifying signalling explanations, where an internally complicated signalling explanation that's hard to distinguish from a simpler 'lying' explanation, gets accepted, not because it's a better explanation overall but just because it has a ready answer to any objections. If 'Social cognition has been the main focus of Rationality' is true, then we need to be careful to avoid overusing such explanations. Stefan Schubert explains how this can end up happening:

I think the correct story is that the value shift has been good and bad - valuing human life more strongly has been good, but along with that its become more valuable to credibly fake valuing human life, which has been bad.

Comment by Sammy Martin (SDM) on Commentary on AGI Safety from First Principles · 2020-11-25T16:28:27.402Z · LW · GW

Yeah - this is a case where how exactly the transition goes seems to make a very big difference. If it's a fast transition to a singleton, altering the goals of the initial AI is going to be super influential. But if it's that there are many generations of AIs that over time become the larger majority of the economy, then just control everything - predictably altering how that goes seems a lot harder at least.

Comparing the entirety of the Bostrom/Yudkowsky singleton intelligence explosion scenario to the slower more spread out scenario, it's not clear that it's easier to predictably alter the course of the future in the first compared to the second.

In the first, assuming you successfully set the goals of the singleton, the hard part is over and the future can be steered easily because there are, by definition, no more coordination problems to deal with. But in the first, a superintelligent AGI could explode on us out of nowhere with little warning and a 'randomly rolled utility function', so the amount of coordination we'd need pre-intelligence explosion might be very large.

In the second slower scenario, there are still ways to influence the development of AI - aside from massive global coordination and legislation, there may well be decision points where two developmental paths are comparable in terms of short-term usefulness but one is much better than the other in terms of alignment or the value of the long-term future. 

Stuart Russell's claim that we need to replace 'the standard model' of AI development is one such example - if he's right, a concerted push now by a few researchers could alter how nearly all future AI systems are developed for the better. So different conditions have to be met for it to be possible to predictably alter the future long in advance on the slow transition model (multiple plausible AI development paths that could be universally adopted and have ethically different outcomes) compared to the fast transition model (the ability to anticipate when and where the intelligence explosion will arrive and do all the necessary alignment work in time), but its not obvious to me one is easier to meet than the other.

 

For this reason, I think it's unlikely there will be a very clearly distinct "takeoff period" that warrants special attention compared to surrounding periods.

I think the period AI systems can, at least in aggregate, finally do all the stuff that people can do might be relatively distinct and critical -- but, if progress in different cognitive domains is sufficiently lumpy, this point could be reached well after the point where we intuitively regard lots of AI systems as on the whole "superintelligent."

This might be another case (like 'the AIs utility function') where we should just retire the term as meaningless, but I think that 'takeoff' isn't always a strictly defined interval, especially if we're towards the medium-slow end. The start of the takeoff has a precise meaning only if you believe that RSI is an all-or-nothing property. In this graph from a post of mine, the light blue curve has an obvious start to the takeoff where the gradient discontinuously changes, but what about the yellow line? There clearly is a takeoff in that progress becomes very rapid, but there's no obvious start point, but there is still a period very different from our current period that is reached in a relatively short space of time - so not 'very clearly distinct' but still 'warrants special attention'.

 

At this point I think it's easier to just discard the terminology altogether. For some agents, it's reasonable to describe them as having goals. For others, it isn't. Some of those goals are dangerous. Some aren't. 

Daniel Dennett's Intentional stance is either a good analogy for the problem of "can't define what has a utility function" or just a rewording of the same issue. Dennett's original formulation doesn't discuss different types of AI systems or utility functions, ranging in 'explicit goal directedness' all the way from expected-minmax game players to deep RL to purely random agents, but instead discusses physical systems ranging from thermostats up to humans. Either way, if you agree with Dennett's formulation of the intentional stance I think you'd also agree that it doesn't make much sense to speak of 'the utility function as necessarily well-defined.

Comment by Sammy Martin (SDM) on Covid 11/19: Don’t Do Stupid Things · 2020-11-20T18:42:48.555Z · LW · GW

Much of Europe went into strict lockdown. I was and am still skeptical that they were right to keep schools open, but it was a real attempt that clearly was capable of working, and it seems to be working.

The new American restrictions are not a real attempt, and have no chance of working.

The way I understand it is that 'being effective' is making an efficient choice taking into account asymmetric risk and the value of information, and the long-run trade-offs. This involves things like harsh early lockdowns, throwing endless money at contact tracing, and strict enforcement of isolation. Think Taiwan, South Korea.

Then 'trying' is adopting policies that have a reasonable good chance of working, but not having a plan if they don't work, not erring on the side of caution of taking into account asymmetric risk when you adopt the policies, and not responding to new evidence quickly. The schools thing is a perfect example - closing has costs (makes the lockdown less effective and therefore longer), and it wasn't overwhelmingly clear that schools had to close to turn R under 1, so that was good enough. Partially funding tracing efforts, waiting until there's visibly no other choice and then calling a strict lockdown - that's 'trying'. Think the UK and France.

And then you have 'trying to try', which you explain in detail.

Dolly Parton helped fund the Moderna vaccine. Neat. No idea why anyone needed to do that, but still. Neat.

It's reassuring to know that if the administrative state and the pharmaceutical industry fails, we have Dolly Parton.

Comment by Sammy Martin (SDM) on Some AI research areas and their relevance to existential safety · 2020-11-20T18:22:22.622Z · LW · GW

That said, I remain interested in more clarity on what you see as the biggest risks with these multi/multi approaches that could be addressed with technical research.

A (though not necessarily the most important) reason to think technical research into computational social choice might be useful is that examining specifically the behaviour of RL agents from a computational social choice perspective might alert us to ways in which coordination with future TAI might be similar or different to the existing coordination problems we face.

(i) make direct improvements in the relevant institutions, in a way that anticipates the changes brought about by AI but will most likely not look like AI research, 

It seems premature to say, in advance of actually seeing what such research uncovers, whether the relevant mechanisms and governance improvements are exactly the same as the improvements we need for good governance generally, or different. Suppose examining the behaviour of current RL agents in social dilemmas leads to a general result which in turn leads us to conclude there's a disproportionate chance TAI in the future will coordinate in some damaging way that we can resolve with a particular new regulation. It's always possible to say, solving the single/single alignment problem will prevent anything like that from happening in the first place, but why put all your hopes on plan A, when plan B is relatively neglected?

Comment by Sammy Martin (SDM) on Some AI research areas and their relevance to existential safety · 2020-11-20T18:10:34.056Z · LW · GW

Thanks for this long and very detailed post!

The MARL projects with the greatest potential to help are probably those that find ways to achieve cooperation between decentrally trained agents in a competitive task environment, because of its potential to minimize destructive conflicts between fleets of AI systems that cause collateral damage to humanity.  That said, even this area of research risks making it easier for fleets of machines to cooperate and/or collude at the exclusion of humans, increasing the risk of humans becoming gradually disenfranchised and perhaps replaced entirely by machines that are better and faster at cooperation than humans.

In ARCHES, you mention that just examining the multiagent behaviour of RL systems (or other systems that work as toy/small-scale examples of what future transformative AI might look like) might enable us to get ahead of potential multiagent risks, or at least try to predict how transformative AI might behave in multiagent settings. The way you describe it in ARCHES, the research would be purely exploratory,

One approach to this research area is to continually ex-amine social dilemmas through the lens of whatever is the leading AI devel-opment paradigm in a given year or decade, and attempt to classify interest-ing behaviors as they emerge. This approach might be viewed as analogousto developing “transparency for multi-agent systems”: first develop inter-esting multi-agent systems, and then try to understand them. 

But what you're suggesting in this post, 'those that find ways to achieve cooperation between decentrally trained agents in a competitive task environment', sounds like combining computational social choice research with multiagent RL -  examining the behaviour of RL agents in social dilemmas and trying to design mechanisms that work to produce the kind of behaviour we want. To do that, you'd need insights from social choice theory. There is some existing research on this, but it's sparse and very exploratory.

My current research is attempting to build on the second of these.

As far as I can tell, that's more or less it in terms of examining RL agents in social dilemmas, so there may well be a lot of low-hanging fruit and interesting discoveries to be made. If the research is specifically about finding ways of achieving cooperation in multiagent systems by choosing the correct (e.g. voting) mechanism, is that not also computational social choice research, and therefore of higher priority by your metric?

In short, computational social choice research will be necessary to legitimize and fulfill governance demands for technology companies (automated and human-run companies alike) to ensure AI technologies are beneficial to and controllable by human society.  

...

CSC neglect:

As mentioned above, I think CSC is still far from ready to fulfill governance demands at the ever-increasing speed and scale that will be needed to ensure existential safety in the wake of “the alignment revolution”. 

Comment by Sammy Martin (SDM) on The 300-year journey to the covid vaccine · 2020-11-10T13:03:45.396Z · LW · GW

The remedies for all our diseases will be discovered long after we are dead; and the world will be made a fit place to live in, after the death of most of those by whose exertions it will have been made so. It is to be hoped that those who live in those days will look back with sympathy to their known and unknown benefactors.

— John Stuart Mill, diary entry for 15 April 1854

Comment by Sammy Martin (SDM) on AGI safety from first principles: Goals and Agency · 2020-11-02T18:00:56.649Z · LW · GW

Furthermore, we should take seriously the possibility that superintelligent AGIs might be even less focused than humans are on achieving large-scale goals. We can imagine them possessing final goals which don’t incentivise the pursuit of power, such as deontological goals, or small-scale goals. 

...

My underlying argument is that agency is not just an emergent property of highly intelligent systems, but rather a set of capabilities which need to be developed during training, and which won’t arise without selection for it

Was this line of argument inspired by Ben Garfinkel's objection to the 'classic' formulation of instrumental convergence/orthogonality - that these are 'measure based' arguments that just identify that a majority of possible agents with some agentive properties and large-scale goals will optimize in malign ways, rather than establishing that we're actually likely to build such agents?

It seems like you're identifying the same additional step that Ben identified, and that I argued could be satisfied - that we need a plausible reason why we would build an agentive AI with large-scale goals.

And the same applies for 'instrumental convergence' - the observation that most possible goals, especially simple goals, imply a tendency to produce extreme outcomes when ruthlessly maximised:

  • A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  

We could see this as marking out a potential danger - a large number of possible mind-designs produce very bad outcomes if implemented. The fact that such designs exist 'weakly suggest' (Ben's words) that AGI poses an existential risk since we might build them. If we add in other premises that imply we are likely to (accidentally or deliberately) build such systems, the argument becomes stronger. But usually the classic arguments simply note instrumental convergence and assume we're 'shooting into the dark' in the space of all possible minds, because they take the abstract statement about possible minds to be speaking directly about the physical world. There are specific reasons to think this might occur (e.g. mesa-optimisation, sufficiently fast progress preventing us from course-correcting if there is even a small initial divergence) but those are the reasons that combine with instrumental convergence to produce a concrete risk, and have to be argued for separately.

Comment by Sammy Martin (SDM) on SDM's Shortform · 2020-10-30T17:04:06.134Z · LW · GW

I think that the notion of Simulacra Levels is both useful and important, especially when we incorporate Harry Frankfurt's idea of Bullshit

Harry Frankfurt's On Bullshit seems relevant here. I think its worth trying to incorporate Frankfurt's definition as well, as it is quite widely known, see e.g. this video - If you were to do so, I think you would say that on Frankfurt's definition, Level 1 tells the truth, Level 2 lies, Level 3 bullshits about physical facts but will lie or tell the truth about things in the social realm (e.g. others motives, your own affiliation), and Level 4 always bullshits.

How do we distinguish lying from bullshit? I worry that there is a tendency to adopt self-justifying signalling explanations, where an internally complicated signalling explanation that's hard to distinguish from a simpler 'lying' explanation, gets accepted, not because it's a better explanation overall but just because it has a ready answer to any objections. If 'Social cognition has been the main focus of Rationality' is true, then we need to be careful to avoid overusing such explanations. Stefan Schubert explains how this can end up happening:

...

It seems to me that it’s pretty common that signalling explanations are unsatisfactory. They’re often logically complex, and it’s tricky to identify exactly what evidence is needed to demonstrate them.

And yet even unsatisfactory signalling explanations are often popular, especially with a certain crowd. It feels like you’re removing the scales from our eyes; like you’re letting us see our true selves, warts and all. And I worry that this feels a bit too good to some: that they forget about checking the details of how the signalling explanations are supposed to work. Thus they devise just-so stories, or fall for them.

This sort of signalling paradigm also has an in-built self-defence, in that critics are suspected of hypocrisy or naïveté. They lack the intellectual honesty that you need to see the world for what it really is, the thinking goes

Comment by Sammy Martin (SDM) on "Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} · 2020-10-30T14:28:12.882Z · LW · GW

It may well be a crux - an efficient 'tree search' or a similar goal-directed wrapper around a GPT-based system, that can play a role in real-world open-ended planning (presumably planning for an agent to be effecting outcomes in the real world via its text generation), would have to cover continuous action spaces and possible states containing unknown and shifting sets of possible actions (unlike the discrete and small, relative to the real universe, action space of Go which is perfect for a tree search), running (or approximating running) millions of primitive steps (individual text generations and exchanges) into the future (for long-term planning towards e.g. a multi-decade goal like humans are capable of).

That sounds like a problem that's at least as hard as a language-model 'success probability predictor' GPT-N (probably with reward-modelling help, so it can optimize for a specific goal with its text generation). Though such a system would still be highly transformative, if it was human-level at prediction.

To clarify, this is Transformative not 'Radically Transformative' - transformative like Nuclear Power/Weapons, not like a new Industrial Revolution or an intelligence explosion.

I would expect tree search powered by GPT-6 to be probably pretty agentic.

I could imagine (if you found a domain with a fairly constrained set of actions and states, but involved text prediction somehow) that you could get agentic behaviour out of a tree search like the ones we currently have + GPT-N + an RL wrapper around the GPT-N. That might well be quite transformative - could imagine it being very good for persuasion, for example.

Comment by Sammy Martin (SDM) on Open & Welcome Thread – October 2020 · 2020-10-30T13:45:47.316Z · LW · GW

I don't know Wei Dai's specific reasons for having such a high level of concern, but I suspect that they are similar to the arguments given by the historian Niall Ferguson in this debate with Yascha Mounk on how dangerous 'cancel culture' is. Ferguson likes to try and forecast social and cultural trends years in advance and thinks that he sees a cultural-revolution like trend growing unchecked.

Ferguson doesn't give an upper bound on how bad he thinks things could get, but he thinks 'worse than McCarthyism' is reasonable to expect over the next few years, because he thinks that 'cancel culture' has more broad cultural support and might also gain hard power in institutions.

Now - I am more willing to credit such worries than I was a year ago, but there's a vast gulf between a trend being concerning and expecting another Cultural Revolution. It feels too much like a direct linear extrapolation fallacy - 'things have become worse over the last year, imagine if that keeps on happening for the next six years!' I wasn't expecting a lot of what happened over the last eight months in the US on the 'cancel culture' side, but I think that a huge amount of this is due to a temporary, Trump- and Covid- and Recession-related heating up of the political discourse, not a durable shift in soft power or people's opinions. I think the opinion polls back this up. If I'm right that this will all cool down, we'll know in another year or so.

I also think that Yascha's arguments in that debate about the need for hard institutional power that's relatively unchecked, to get a Cultural-Revolution like outcome, are really worth considering. I don't see any realistic path to that level of hard, governmental power at enough levels being held by any group in the US.