Posts

SDM's Shortform 2020-07-23T14:53:52.568Z
Modelling Continuous Progress 2020-06-23T18:06:47.474Z
Coronavirus as a test-run for X-risks 2020-06-13T21:00:13.859Z
Will AI undergo discontinuous progress? 2020-02-21T22:16:59.424Z
The Value Definition Problem 2019-11-18T19:56:43.271Z

Comments

Comment by sdm on Mathematical Models of Progress? · 2021-02-16T15:47:42.268Z · LW · GW

I made an attempt to model intelligence explosion dynamics in this post, by attempting to make the very oversimplified exponential-returns-to-exponentially-increasing-intelligence model used by Bostrom and Yudkowsky slightly less oversimplified.

This post tries to build on a simplified mathematical model of takeoff which was first put forward by Eliezer Yudkowsky and then refined by Bostrom in Superintelligence, modifying it to account for the different assumptions behind continuous, fast progress as opposed to discontinuous progress. As far as I can tell, few people have touched these sorts of simple models since the early 2010’s, and no-one has tried to formalize how newer notions of continuous takeoff fit into them. I find that it is surprisingly easy to accommodate continuous progress and that the results are intuitive and fit with what has already been said qualitatively about continuous progress.

The page includes python code for the model.

This post doesn't capture all the views of takeoff - in particular it doesn't capture the non-hyperbolic faster growth mode scenario, where marginal intelligence improvements are exponentially increasingly difficult and therefore we get a (continuous or discontinuous switch to a) new exponential growth mode rather than runaway hyperbolic growth.

But I think that by modifying the f(I) function that determines how RSI capability varies with intelligence we can incorporate such views.

(In the context of the exponential model given in the post that would correspond to an f(I) function where 

which would result in a continuous (determined by size of d) switch to a single faster exponential growth mode)

But I think the model still roughly captures the intuition behind scenarios that involve either a continuous or a discontinuous step to an intelligence explosion.

Comment by sdm on The Meaning That Immortality Gives to Life · 2021-02-16T12:21:47.180Z · LW · GW

Modern literature about immortality is written primarily by authors who expect to die, and their grapes are accordingly sour. 

This is still just as true as when this essay was written, I think - even the Culture had its human citizens mostly choosing to die after a time... to the extent that I eventually decided: if you want something done properly, do it yourself.

But there are exceptions - the best example of published popular fiction that has immortality as a basic fact of life is the Commonwealth Saga by Peter F Hamilton and the later Void Trilogy (the first couple of books were out in 2007).

The Commonwealth has effective immortality, a few downsides of it are even noticable (their culture and politics is a bit more stagnant than we might like), but there's never any doubt at all that it's worth it, and it's barely commented on in the story,

In truth, I suspect that if people were immortal, they would not think overmuch about the meaning that immortality gives to life. 

(Incidentally, the latter-day Void Trilogy Commonwealth is probably the closest a work of published fiction has come to depicting a true eudaimonic utopia that lacks the problems of the culture)

I wonder if there's been any harder to detect shift in how immortality is portrayed in fiction since 2007? Is it still as rare now as then to depict it as a bad thing?

Comment by sdm on Covid 2/11: As Expected · 2021-02-12T12:05:38.568Z · LW · GW

The UK vaccine rollout is considered a success, and by the standards of other results, it is indeed a success. This interview explains how they did it, which was essentially ‘make deals with companies and pay them money in exchange for doses of vaccines.’

A piece of this story you may find interesting (as an example of a government minister making a decision based on object level physical considerations): multiple reports say Matt Hancock, the UK's health Secretary, made the decision to insist on over-ordering vaccines because he saw the movie Contagion and was shocked into viscerally realising how important a speedy rollout was.

https://www.economist.com/britain/2021/02/06/after-a-shaky-start-matt-hancock-has-got-the-big-calls-right

It might just be a nice piece of PR, but even if that's the case it's still a good metaphor for how object level physical considerations can intrude into government decision making

Comment by sdm on Review of Soft Takeoff Can Still Lead to DSA · 2021-02-06T16:08:26.501Z · LW · GW

I agree with your argument about likelihood of DSA being higher compared to previous accelerations, due to society not being able to speed up as fast as the technology. This is sorta what I had in mind with my original argument for DSA; I was thinking that leaks/spying/etc. would not speed up nearly as fast as the relevant AI tech speeds up.

Your post on 'against GDP as a metric' argues more forcefully for the same thing that I was arguing for, that 

'the economic doubling time' stops being so meaningful - technological progress speeds up abruptly but other kinds of progress that adapt to tech progress have more of a lag before the increased technological progress also affects them? 

So we're on the same page there that it's not likely that 'the economic doubling time' captures everything that's going on all that well, which leads to another problem - how do we predict what level of capability is necessary for a transformative AI to obtain a DSA (or reach the PONR for a DSA)?

I notice that in your post you don't propose an alternative metric to GDP, which is fair enough since most of your arguments seem to lead to the conclusion that it's almost impossibly difficult to predict in advance what level of advantage over the rest of the world in which areas are actually needed to conquer the world, since we seem to be able to analogize persuasion tools to or conquistador-analogues who had relatively small tech advantages, to the AGI situation.

I think that there is still a useful role for raw economic power measurements, in that they provide a sort of upper bound on how much capability difference is needed to conquer the world. If an AGI acquires resources equivalent to controlling >50% of the world's entire GDP, it can probably take over the world if it goes for the maximally brute force approach of just using direct military force. Presumably the PONR for that situation would be awhile before then, but at least we know that an advantage of a certain size would be big enough given no assumptions about the effectiveness of unproven technologies of persuasion or manipulation or specific vulnerabilities in human civilization.

So we can use our estimate of how doubling time may increase, anchor on that gap and estimate down based on how soon we think the PONR is, or how many 'cheat' pathways that don't involve economic growth there are.

The whole idea of using brute economic advantage as an upper limit 'anchor' I got from Ajeya's Post about using biological anchors to forecast what's required for TAI - if we could find a reasonable lower bound for the amount of advantage needed to attain DSA we could do the same kind of estimated distribution between them. We would just need a lower limit - maybe there's a way of estimating it based on the upper limit of human ability since we know no actually existing human has used persuasion to take over the world but as you point out they've come relatively close.

I realize that's not a great method, but is there any better alternative given that this is a situation we've never encountered before, for trying to predict what level of capability is necessary for DSA? Or perhaps you just think that anchoring your prior estimate based on economic power advantage as an upper bound is so misleading it's worse than having a completely ignorant prior. In that case, we might have to say that there are just so many unprecedented ways that a transformative AI could obtain a DSA that we can just have no idea in advance what capability is needed, which doesn't feel quite right to me.

Comment by sdm on Ten Causes of Mazedom · 2021-01-18T13:35:19.544Z · LW · GW

Finally got round to reading your sequence and it looks like we disagree a lot less than I thought, since your first three causes are exactly what I was arguing for in my reply,

This is probably the crux. I don't think we tend to go to higher simulacra levels now, compared to decades ago. I think it's always been quite prevalent, and has been roughly constant through history. While signalling explanations definitely tell us a lot about particular failings, they can't explain the reason things are worse now in certain ways, compared to before. The difference isn't because of the perennial problem of pervasive signalling. It has more to do with economic stagnation and not enough state capacity. These flaws mean useful action gets replaced by useless action, and allow more room for wasteful signalling.

As one point in favour of this model, I think it's worth noting that the historical comparisons aren't ever to us actually succeeding at dealing with pandemics in the past, but to things like "WWII-style" efforts - i.e. thinking that if we could just do x as well as we once did y then things would have been a lot better.

This implies that if you made an institution analogous to e.g. the weapons researchers of WW2 and the governments that funded them, or NASA in the 1960s, without copy-pasting 1940s/1960s society wholesale, the outcome would have been better. To me that suggests it's institution design that's the culprit, not this more ethereal value drift or increase in overall simulacra levels.

I think you'd agree with most of that, except that you see a much more significant causal role for the cultural factors like increased fragility and social atomisation. There is pretty solid evidence for both being real problems, Jon Haidt presents the best case to take these seriously, although it's not as definitive as you make out (E.g. Suicide rates are basically a random walk), and your explanation for how they lead to institutional problems is reasonable, but I wonder if they are even needed as explanations when your first three causes are so strong and obvious,

Essentially I see your big list like this:

Main Drivers:

Cause 1: More Real Need For Large Organizations (includes decreasing low hanging fruit) Cause 2: Laws and Regulations Favor Large Organizations Cause 3: Less Disruption of Existing Organizations Cause 5: Rent Seeking is More Widespread and Seen as Legitimate

Real but more minor:

Cause 4: Increased Demand for Illusion of Safety and Security Cause 8: Atomization and the Delegitimization of Human Social Needs Cause 7: Ignorance Cause 9: Educational System Cause 10: Vicious Cycle

No idea but should look into:

Cause 6: Big Data, Machine Learning and Internet Economics

Essentially my view is that if you directly addressed the main drivers with large legal or institutional changes the other causes of mazedom wouldn't fight back.

I believe that the 'obvious legible institutional risks first' view is in line with what others who've written on this problem like Tyler Cowen or Sam Bowman think, but it's a fairly minor disagreement since most of your proposed fixes are on the institutional side of things anyway.

Also, the preface is very important - these are some of the only trends that seem to be going the wrong way consistently in developed countries for a while now, and they're exactly the forces you'd expect to be hardest to resist.

The world is better for people than it was back then. There are many things that have improved. This is not one of them.

Comment by sdm on Review of Soft Takeoff Can Still Lead to DSA · 2021-01-10T20:29:34.382Z · LW · GW

Currently the most plausible doom scenario in my mind is maybe a version of Paul’s Type II failure. (If this is surprising to you, reread it while asking yourself what terms like “correlated automation failure” are euphemisms for.) 

This is interesting, and I'd like to see you expand on this. Incidentally I agree with the statement, but I can imagine both more and less explosive, catastrophic versions of 'correlated automation failure'. On the one hand it makes me think of things like transportation and electricity going haywire, on the other it could fit a scenario where a collection of powerful AI systems simultaneously intentionally wipe out humanity.

Clock-time leads shrink automatically as the pace of innovation speeds up, because if everyone is innovating 10x faster, then you need 10x as many hoarded ideas to have an N-year lead. 

What if, as a general fact, some kinds of progress (the technological kinds more closely correlated with AI) are just much more susceptible to speed-up? I.e, what if 'the economic doubling time' stops being so meaningful - technological progress speeds up abruptly but other kinds of progress that adapt to tech progress have more of a lag before the increased technological progress also affects them? In that case, if the parts of overall progress that affect the likelihood of leaks, theft and spying aren't sped up by as much as the rate of actual technology progress, the likelihood of DSA could rise to be quite high compared to previous accelerations where the order of magnitude where the speed-up occurred was fast enough to allow society to 'speed up' the same way.

In other words - it becomes easier to hoard more and more ideas if the ability to hoard ideas is roughly constant but the pace of progress increases. Since a lot of these 'technologies' for facilitating leaks and spying are more in the social realm, this seems plausible.

But if you need to generate more ideas, this might just mean that if you have a very large initial lead, you can turn it into a DSA, which you still seem to agree with:

  • Even if takeoff takes several years it could be unevenly distributed such that (for example) 30% of the strategically relevant research progress happens in a single corporation. I think 30% of the strategically relevant research happening in a single corporation at beginning of a multi-year takeoff would probably be enough for DSA.
Comment by sdm on Fourth Wave Covid Toy Modeling · 2021-01-10T10:37:49.241Z · LW · GW

I meant, 'based on what you've said about Zvi's model' I.e. Nostalgebraist says zvi says Rt never goes below 1 - if you look at the plot he produced Rt is always above 1 given Zvi's assumptions, which the London data falsified.

Comment by sdm on Fourth Wave Covid Toy Modeling · 2021-01-09T19:33:11.765Z · LW · GW
  • It seems better to first propose a model we know can match past data, and then add a tuning term/effect for "pandemic fatigue" for future prediction.

To get a sense of scale, here is one of the plots from my notebook:

https://64.media.tumblr.com/823e3a2f55bd8d1edb385be17cd546c7/673bfeb02b591235-2b/s640x960/64515d7016eeb578e6d9c45020ce1722cbb6af59.png

The colored points show historical data on R vs. the 6-period average, with color indicating the date.

Thanks for actually plotting historical Rt vs infection rates!

Whereas, it seems more natural to take (3) as evidence that (1) was wrong.

In my own comment, I also identified the control system model of any kind of proportionality of Rt to infections as a problem. Based on my own observations of behaviour and government response, the MNM hypothesis seems more likely (governments hitting the panic button as imminent death approaches, i.e. hospitals begin to be overwhelmed) than a response that ramps up proportionate to recent infections. I think that explains the tight oscillations.

I'd say the dominant contributor to control systems is something like a step function at a particular level near where hospitals are overwhelmed, and individual responses proportionate to exact levels of infection are a lesser part of it.

You could maybe operationalize this by looking at past hospitalization rates, fitting a logistic curve to them at the 'overwhelmed' threshold and seeing if that predicts Rt. I think it would do pretty well.

This tight control was a surprise and is hard to reproduce in a model, but if our model doesn't reproduce it, we will go on being surprised by the same thing that surprised us before.

My own predictions are essentially based on continuing to expect the 'tight control' to continue somehow, i.e. flattening out cases or declining a bit at a very high level after a large swing upwards.

It looks like (subsequent couple of days data seem to confirm this), Rt is currently just below 1 in London - which would outright falsify any model that claims Rt never goes below 1 for any amount of infection with the new variant, given our control system response, which according to your graph, the infections exponential model does predict.

If you ran this model on the past, what would it predict? Based on what you've said, Rt never goes below one, so there would be a huge first wave with a rapid rise up to partial herd immunity over weeks, based on your diagram. That's the exact same predictive error that was made last year.

I note - outside view - that this is very similar to the predictive mistake made last Febuary/March with old Covid-19 - many around here were practically certain we were bound for an immediate (in a month or two) enormous herd immunity overshoot.

Comment by sdm on Eight claims about multi-agent AGI safety · 2021-01-07T19:48:32.210Z · LW · GW

Humans have skills and motivations (such as deception, manipulation and power-hungriness) which would be dangerous in AGIs. It seems plausible that the development of many of these traits was driven by competition with other humans, and that AGIs trained to answer questions or do other limited-scope tasks would be safer and less goal-directed. I briefly make this argument here.

Note that he claims that this may be true even if single/single alignment is solved, and all AGIs involved are aligned to their respective users.

It strikes me as interesting that much of the existing work that's been done on multiagent training, such as it is, focusses on just examining the behaviour of artificial agents in social dilemmas. The thinking seems to be - and this was also suggested in ARCHES - that it's useful just for exploratory purposes to try to characterise how and whether RL agents cooperate in social dilemmas, what mechanism designs and what agent designs promote what types of cooperation, and if there are any general trends in terms of what kinds of multiagent failures RL tends to fall into.

For example, it's generally known that regular RL tends to fail to cooperate in social dilemmas, 'Unfortunately, selfish MARL agents typically fail when faced with social dilemmas'. From ARCHES:

One approach to this research area is to continually ex-amine social dilemmas through the lens of whatever is the leading AI devel-opment paradigm in a given year or decade, and attempt to classify interest-ing behaviors as they emerge. This approach might be viewed as analogous to developing “transparency for multi-agent systems”: first develop inter-esting multi-agent systems, and then try to understand them.

There seems to be an implicit assumption here that something very important and unique to multiagent situations would be uncovered - by analogy to things like the flash crash. It's not clear to me that we've examined the intersection of RL and social dilemmas enough to notice if this were true, if it were true, and I think that's the major justification for working on this area.

Comment by sdm on Fourth Wave Covid Toy Modeling · 2021-01-07T14:11:35.278Z · LW · GW

One thing that you didn't account for - the method of directly scaling the Rt by the multiple on the R0 (which seems to be around 1.55), is only a rough estimate of how much the Rt will increase by when the effective Rt is lowered in a particular situation. It could be almost arbitrarily wrong - intuitively, if the hairdressers are closed, that prevents 100% of transmission in hairdressers no matter how much higher the R0 of the virus is.

For this reason, the actual epidemiological models (there aren't any for the US for the new variant, only some for the UK), have some more complicated way of predicting the effect of control measures. This from Imperial College:

We quantified the transmission advantage of the VOC relative to non-VOC lineages in twoways: as an additive increase in R that ranged between 0.4 and 0.7, and alternatively as amultiplicative increase in R that ranged between a 50% and 75% advantage. We were not ableto distinguish between these two approaches in goodness-of-fit, and either is plausiblemechanistically. A multiplicative transmission advantage would be expected if transmissibilityhad increased in all settings and individuals, while an additive advantage might reflect increasesin transmissibility in specific subpopulations or contexts.

The multiplicative 'increased transmissibility' estimate will therefore tend to underestimate the effect of control measures. The actual paper did some complicated Bayesian regression to try and figure out which model of Rt change worked best, and couldn't figure it out.

Measures like ventilation, physical distancing when you do decide to meet up, and mask use will be more multiplicative in how the new variant diminishes their effect. The parts of the behaviour response that involve people just not deciding to meet up or do things in the first place, and anything involving mandatory closures of schools, bars etc. will be less multiplicative.

 

I believe this is borne out in the early data. Lockdown 1 in the UK took Rt down to 0.6. The naive 'multiplicative' estimate would say that's sufficient for the new variant, Rt=0.93. The second lockdown took Rt down to 0.8, which would be totally insufficient. You'd need Rt for the old variant of covid down to 0.64 on the naive multiplicative estimate - almost what was achieved in March. I have a hard time believing it was anywhere near that low in the Tier 4 regions around Christmas.

But the data that's come in so far seems to indicate that Tier 4 + Schools closed has either levelled off or caused slow declines in infections in those regions where they were applied.

First, the random infection survey - London and South East are in decline and East of England has levelled off (page 3). The UKs symptom study, which uses a totally different methodology, confirms some levelling off and declines in those regions - page 6. It's early days, but clearly Rt is very near 1, and likely below 1 in London. The Financial Times cottoned on to this a few days late but no-one else seems to have noticed.

I think this indicates a bunch of things - mainly that infections caused by the new variant can and will be stabilized or even reduced by lockdown measures which people are willing to obey. It's not impossible if it's already happening.

 

To start, let’s also ignore phase shifts like overloading hospitals, and ignore fatigue on the hopes that vaccines coming soon will cancel it out, although there’s an argument that in practice some people do the opposite.

I agree with ignoring fatigue, but ignoring phase shifts? If it were me I'd model the entire control system response as a phase shift with the level for the switch in reactions set near the hospital overwhelm level - at least on the policy side, there seems to be an abrupt reaction specifically to the hospital overloading question. The British government pushed the panic button a few days ago in response to that and called a full national lockdown. I'd say the dominant contributor to control systems is something like a step function at a particular level near where hospitals are overwhelmed, and individual responses proportionate to exact levels of infection are a lesser part of it.

I think the model of the control system as a continuous response is wrong, and a phased all-or-nothing response for the government side of things, plus taking into account non-multiplicative effects on the Rt, would produce overall very different results - namely that a colossal overshoot of herd immunity in a mere few weeks is probably not happening. I note - outside view - that this is very similar to the predictive mistake made last Febuary/March with old Covid-19 - many around here were practically certain we were bound for an immediate (in a month or two) enormous herd immunity overshoot.

Comment by sdm on Covid 12/31: Meet the New Year · 2021-01-05T11:40:03.554Z · LW · GW

Many of the same thoughts were in my mind when I linked when I linked that study on the previous post.

----

IMO, it would help clarify arguments about the "control system" a lot to write down the ideas in some quantitative form.

...

This tells you nothing about the maximum power of my heating system.  In colder temperatures, it'd need to work harder, and at some low enough temperature T, it wouldn't be able to sustain 70F inside.  But we can't tell what that cutoff T is until we reach it.  "The indoor temperature right now oscillates around 70F" doesn't tell you anything about T.

I agree, and in fact the main point I was getting at with my initial comment is that in the two areas I talked about - namely the control system and the overall explanation for failure, there's an unfortunate tendency to toss out quantitative arguments or even detailed models of the world and instead resort to intuitions and qualitative arguments - and then it has a tendency to turn into a referendum on your personal opinions about human nature and the human condition, which isn't that useful for predicting anything. You can see this in how the predictions panned out - as was pointed out by some anonymous commenter, control system 'running out of power' arguments generally haven't been that predictively accurate when it comes to these questions.

The rule-of-thumb that I've used - the Morituri Nolumus Mori effect - has fared somewhat better than the 'control system will run out of steam sooner or later' rule-of-thumb, both when I wrote that post and since. The MNM tends to predict last-minute myopic decisions that mostly avoid the worst outcomes, while the 'out of steam' explanation led people to predict that social distancing would mostly be over by now. But neither is a proper quantitative model.

In terms of actually giving some quantitative rigour to this question - it's not easy. I made an effort in my old post, by saying how far a society can stray from a control system equilibrium is indicated by how low they managed to get Rt - but the 'gold standard' is to just work off model projections trained on already existing data like I tried to do.

 

As to the second question - overall explanation, there is some data to work off of, but not much. We know that preexisting measures of state capacity don't predict covid response effectiveness, which along with other evidence suggests the 'institutional schlerosis' hypothesis I referred to in my original post. Once again, I think that a clear mechanism - 'institutional sclerosis as part of the great stagnation' - is a much better starting point for unravelling all this than the 'simulacra levels are higher now' perspective that I see a lot around here. That claim is too abstract to easily falsify or derive genuine in-advance predictions.

Comment by sdm on Covid 12/31: Meet the New Year · 2021-01-01T14:07:59.917Z · LW · GW

I live in Southern England and so have a fair bit of personal investment in all this, but I'll try to be objective. My first reaction, upon reading the LSHTM paper that you referred to, is 'we can no longer win, but we can lose less' - i.e. we are all headed for herd immunity one way or another by mid-year, but we can still do a lot to protect people. That would have been my headline - it's over for suppression and elimination, but 'it's over' isn't quite right. Your initial reaction was different:

Are We F***ed? Is it Over?

Yeah, probably. Sure looks like it.

The twin central points last were that we were probably facing a much more infectious strain (70%), and that if we are fucked in this way, then it is effectively already over in the sense that our prevention efforts would be in vain.

The baseline scenario remains, in my mind, that the variant takes over some time in early spring, the control system kicks in as a function of hospitalizations and deaths so with several weeks lag, and likely it runs out of power before it stabilizes things at all, and we blow past herd immunity relatively quickly combining that with our vaccination efforts.

You give multiple reasons to expect this, all of which make complete sense - Lockdown fatigue, the inefficiency of prevention, lags in control systems, control systems can't compensate etc. I could give similar reasons to expect the alternative - mainly that the MNM predicts the extreme strength of control systems and that it looks like many places in Europe/Australia did take Rt down to 0.6 or even below!

But luckily, none of that is necessary.

This preprint model via the LessWrong thread has a confidence interval for increased infectiousness of 50%-74%.

I would encourage everyone to look at the scenarios in this paper since they neatly explain exactly what we're facing and mean we don't have to rely on guestimate models and inference about behaviour changes. This model is likely highly robust - it successfully predicted the course of the UK's previous lockdown, with whatever compliance we had then. They simply updated it by putting in the increased infectiousness of the new variant. Since that last lockdown was very recent, compliance isn't going to be wildly different, weather was cold during the previous lockdown, schools were open etc. The estimate for the increase in R given in this paper seems to be the same as that given by other groups e.g. Imperial College.

So what does the paper imply? Essentially a Level 4 lockdown (median estimate) flattens out case growth but with schools closed a L4 lockdown causes cases to decline a bit (page 10). 10x-ing the vaccination rate from 200,000 to 2 million reduces the overall numbers of deaths by more than half (page 11). And they only model a one-month lockdown, but that still makes a significant difference to overall deaths (page 11). We managed 500k vaccinations the first week, and it dropped a bit the second week, but with first-doses first and the Oxford/AZ vaccine it should increase again and land somewhere between those two scenarios. Who knows where? For the US, the fundamental situation may look like the first model - no lockdowns at all, so have a look.

(Also of note is that the peak demand on the medical system even in the bad scenarios with a level 4 lockdown and schools open is less than 1.5x what was seen during the first peak. That's certainly enough to boost the IFR and could be described as 'healthcare system collapse', since it means surge capacity being used, healthcare workers being wildly overstretched, but to my mind 'collapse' refers to demand that exceeds supply by many multiples such that most people can't get any proper care at all - as was talked about in late feb/early march.)

(Edit: the level of accuracy of the LSHTM model should become clear in a week or two)

The nature of our situation now is such that every day of delay and every extra vaccinated person makes us incrementally better off.

This is a simpler situation than before - before we had the option of suppression, which is all-or-nothing - either you get R under 1 or you don't. The race condition that we're in now, where short lockdowns that temporarily hold off the virus buy us useful time, and speeding up vaccination increases herd immunity and decreases deaths and slackens the burden on the medical system, is a straightforward fight by comparison. You just do whatever you can to beat it back and vaccinate as fast as you can.

Now, I don't think you really disagree with me here, except about some minor factual details (I reckon your pre-existing intuitions about what 'Level 4 lockdown' would be capable of doing are different to mine), and you mention the extreme urgency of speeding up vaccine rollout often,

We also have a vaccination crisis. WIth the new strain coming, getting as many people vaccinated as fast as possible becomes that much more important.

...

With the more reasonable version of this being “we really really really should do everything to speed up our vaccinations, everyone, and to focus them on those most likely to die of Covid-19.” That’s certainly part of the correct answer, and likely the most important one for us as a group.

But if I were writing this, my loud headline message would not have been 'It's over', because none of this is over, many decisions still matter. It's only 'over' for the possibility of long term suppression.

*****

There's also the much broader point - the 'what, precisely, is wrong with us' question. This is very interesting and complex and deserves a long discussion of its own. I might write one at some point. I'm just giving some initial thoughts here, partly a very delayed response to your reply to me 2 weeks ago (https://www.lesswrong.com/posts/Rvzdi8RS9Bda5aLt2/covid-12-17-the-first-dose?commentId=QvYbhxS2DL4GDB6hF). I think we have a hard-to-place disagreement about some of the ultimate causes of our coronavirus failures.

We got a shout-out in Shtetl-Optimized, as he offers his “crackpot theory” that if we were a functional civilization we might have acted like one and vaccinated everyone a while ago

...

I think almost everyone on earth could have, and should have, already been vaccinated by now. I think a faster, “WWII-style” approach would’ve saved millions of lives, prevented economic destruction, and carried negligible risks compared to its benefits. I think this will be clear to future generations, who’ll write PhD theses exploring how it was possible that we invented multiple effective covid vaccines in mere days or weeks

He's totally right on the facts, of course. The question is what to blame. I think our disagreement here, as revealed in our last discussion, is interesting. The first order answer is institutional sclerosis, inability to properly do expected value reasoning and respond rapidly to new evidence. We all agree on that and all see the problem. You said to me,

And I agree that if government is determined to prevent useful private action (e.g. "We have 2020 values")...

Implying, as you've said elsewhere, that the malaise has a deeper source. When I said "2020 values" I referred to our overall greater valuation of human life, while you took it to refer to our tendency to interfere with private action - something you clearly think is deeply connected to the values we (individuals and governments) hold today.

I see a long term shift towards a greater valuation of life that has been mostly positive, and some other cause producing a terrible outcome from coronavirus in western countries, and you see a value shift towards higher S levels that has caused the bad outcomes from coronavirus and other bad things.

Unlike Robin Hanson, though, you aren't recommending we attempt to tell people to go off and have different values - you're simply noting that you think our tendency to make larger sacrifices is a mistake.

"...even when the trade-offs are similar, which ties into my view that simulacra and maze levels are higher, with a larger role played by fear of motive ambiguity."

This is probably the crux. I don't think we tend to go to higher simulacra levels now, compared to decades ago. I think it's always been quite prevalent, and has been roughly constant through history. While signalling explanations definitely tell us a lot about particular failings, they can't explain the reason things are worse now in certain ways, compared to before. The difference isn't because of the perennial problem of pervasive signalling. It has more to do with economic stagnation and not enough state capacity. These flaws mean useful action gets replaced by useless action, and allow more room for wasteful signalling.

As one point in favour of this model, I think it's worth noting that the historical comparisons aren't ever to us actually succeeding at dealing with pandemics in the past, but to things like "WWII-style" efforts - i.e. thinking that if we could just do x as well as we once did y then things would have been a lot better.

This implies that if you made an institution analogous to e.g. the weapons researchers of WW2 and the governments that funded them, or NASA in the 1960s, without copy-pasting 1940s/1960s society wholesale, the outcome would have been better. To me that suggests it's institution design that's the culprit, not this more ethereal value drift or increase in overall simulacra levels. There are other independent reasons to think the value shift has been mostly good, ones I talked about in my last post.

As a corollary, I also think that your mistaken predictions in the past - that we'd give up on suppression or that the control system would fizzle out, are related to this. If you think we operate at higher S levels than in the past, you'd be more inclined to think we'll sooner or later sleepwalk into a disaster. If you think there is a strong, consistent, S1 drag away from disaster, as I argued way back here, you'd expect strong control system effects that seem surprisingly immune to 'fatigue'.

Comment by sdm on New SARS-CoV-2 variant · 2020-12-22T00:00:04.400Z · LW · GW

Update: this from public health England explicitly says Rt increases by 0.57, https://twitter.com/DevanSinha/status/1341132723105230848?s=20

"We find that Rt increases by 0.57 [95%CI: 0.25-1.25] when we use a fixed effect model for each area. Using a random effect model for each area gives an estimated additive effect of 0.74 [95%CI: 0.44- 1.29].

an area with an Rt of 0.8 without the new variant would have an Rt of 1.32 [95%CI:1.19-1.50] if only the VOC was present."

But for R, if it's 0.6 not 0.8 and the ratio is fixed then another march style lockdown in the UK would give R = 0.6 *(1.32/0.8)= 0.99

Comment by sdm on New SARS-CoV-2 variant · 2020-12-21T20:54:51.473Z · LW · GW

EDIT: doubling time would go from 17 days to 4 days (!) with the above change of numbers. This doesn't fit given what is currently observed.

The doubling time for the new strain does appear to be around 6-7 days. And the doubling time for London overall is currently 6 days.

If the mitigated Rt is +0.66 and the growth rate is +71% figures are inconsistent with each other as you say, then perhaps the second is mistaken and +71% means that the Rt is 71% higher, not the case growth rate, which is vaguely consistent with the Rt is +58% higher estimate from the absolute increase. Or "71% higher daily growth rate" could be right and the +0.66 could be referring to the R0, as you say.

This does appear to have been summarized as 'the new strain is 71% more infectious' in many places, and many people have apparently inferred the R0 is >50% higher - hopefully we're wrong.

Computer modelling of the viral spread suggests the new variant could be 70 per cent more transmissible. The modelling shows it may raise the R value of the virus — the average number of people to whom someone with Covid-19 passes the infection — by at least 0.4,

I think this is what happens when people don't show their work.

So either 'R number' is actually referring to R0 and not Rt, or 'growth rate' isn't referring to the daily growth rate but to the Rt/R0. I agree that the first is more plausible. All I'll say is that a lot of people are assuming the 70% figure or something close to it is a direct multiplier to the Rt, including major news organizations like the Times and Ft. But I think you're probably right and the R0 is more like 15% larger not 58/70% higher.

EDIT: New info from PHE seems to contradict this, https://t.co/r6GOyXFDjh?amp=1

Comment by sdm on New SARS-CoV-2 variant · 2020-12-21T20:22:49.908Z · LW · GW

EDIT: PHE has seemingly confirmed the higher estimate for change in R, ~65%. https://t.co/r6GOyXFDjh?amp=1

What, uh, does the "71% higher growth rate" mean

TLDR: I think that it's probably barely 15% more infectious and the math of spread near equilibrium amplifies things.

I admit that I have not read all available  documents in detail, but I presume that what they said means something like "if ancestor has a doubling time of X, then variant is estimated as having a doubling time of X/(1+0.71) = 0.58X"

In the meeting minutes, the R-value (Rt) was estimated to have increased by 0.39 to 0.93, the central estimate being +0.66 - 'an absolute increase in the R-value of between 0.39 to 0.93'. Then we see 'the growth rate is 71% higher than other variants'. You're right that this is referring to the case growth rate - they're saying the daily increase is 1.71 times higher, possibly?

I'm going to estimate the relative difference in Rt of the 2 strains from the absolute difference they provided - the relative difference in Rt (Rt(new covid now)/Rt(old covid now)) in the same region, should, I think, be the factor that tells us how more infectious the new strain is.

We need to know what the pre-existing, current, Rt of just the old strain of covid-19 is. Current central estimate for covid in the UK overall is 1.15. This guess was that the 'old covid' Rt was 1.13.

(0.66+1.13)/1.13 = 1.79 (Rt of new covid now)/1.13(Rt of old covid now) = 1.58, which implies that the Rt of the new covid is currently 58% higher than the old, which should be a constant factor, unless I'm missing something fundamental.  (For what it's worth, the Rt in london where the new strain makes up the majority of cases is close to that 1.79) value). So, the Rt and the R0 of the new covid is 58% higher - that would make the R0 somewhere around 4.5-5.

Something like that rough conclusion was also reached e.g. here or here or here or here or here, with discussion of 'what if the R0 was over 5' or '70% more infectious' or 'Western-style lockdown will not suppress' (though may be confusing the daily growth rate with the R0). This estimate from different data said the Rt was 1.66/1.13 = 47% higher which is close-ish to the 58% estimate.

I may have made a mistake somewhere here, and those sources have made the same mistake, but this seems inconsistent with your estimate that the new covid is 15% more infectious, i.e. the Rt and R0 is 15% higher not 58% higher.

This seems like a hugely consequential question. If the Rt of the new strain is more than ~66% larger than the Rt of the old strain, then March-style lockdowns which reduced Rt to 0.6 will not work, and the covid endgame will turn into a bloody managed retreat, to delay the spread and flatten the curve for as long as possible while we try to vaccinate as many people as possible. Of course, we should just go faster regardless:

Second, we do have vaccines and so in any plausible model faster viral spread implies a faster timetable for vaccine approval and distribution.  And it implies we should have been faster to begin with. If you used to say “we were just slow enough,” you now have to revise that opinion and believe that greater speed is called for, both prospectively and looking backwards. In any plausible model.

If you are right then this is just a minor step up in difficulty.

Tom Chivers agrees with you, that this is an 'amber light', metaculus seems undecided (probability of UK 2nd wave worse than 1st; increased by 20% to 42% when this news appeared), some of the forecasters seem to agree with you or be uncertain.

Comment by sdm on Covid 12/17: The First Dose · 2020-12-18T19:19:39.528Z · LW · GW

On the economic front, we would have had to choose either to actually suppress the virus, in which case we get much better outcomes all around, or to accept that the virus couldn’t be stopped, *which also produces better economic outcomes. *

Our technological advancement gave us the choice to make massively larger Sacrifices to the Gods rather than deal with the situation. And as we all know, choices are bad. We also are, in my model, much more inclined to make such sacrifices now than we were in the past,

So, by 'Sacrifices to the Gods' I assume you're referring to the entirety of our suppression spending - because it's not all been wasted money, even if a large part of it has. In other places you use that phrase to refer specifically to ineffective preventative measures.

'We also are, in my model, much more inclined to make such sacrifices now than we were in the past '- this is a very important point that I'm glad you recognise - there has been a shift in values such that we (as individuals, as well as governments) are guaranteed to take the option of attempting to avoid getting the virus and sacrificing the economy to a greater degree than in 1919, or 1350, because our society values human life and safety differently.

And realistically, if we'd approached this with pre-2020 values and pre-2020 technology, we'd have 'chosen' to let the disease spread and suffered a great deal of death and destruction - but that option is no longer open to us. For better, as I think, or for worse, as you think.

You can do the abstract a cost-benefit calculation about whether the other harms of the disease have caused more damage than the disease, but it won't tell you anything about whether the act of getting governments to stop lockdowns and suppression measures will be better or worse than having them to try. Robin Hanson directly confuses these two in his argument that we are over-preventing covid.

We see variations in both kinds of policy across space and time, due both to private and government choices, all of which seem modestly influenceable by intellectuals like Caplan, Cowen, and I...

But we should also consider the very real possibility that the political and policy worlds aren’t very capable of listening to our advice about which particular policies are more effective than others. They may well mostly just hear us say “more” or “less”, such as seems to happen in medical and education spending debates.

Here Hanson is equivocating between (correctly) identifying the entire cost of COVID-19 prevention as due to 'both private and government choices' and then focussing on just 'the political and policy worlds' in response to whether we should argue for less prevention. The claim (which may or may not be true) that 'we overall are over-preventing covid relative to the abstract alternative where we don't' gets equated to 'therefore telling people to overall reduce spending on covid prevention will be beneficial on cost-benefit terms'.

Telling governments to spend less money is much more likely to work than ordering people to have different values. So making governments spend less on covid prevention diminishes their more effective preventative actions while doing very little about the source of most of the covid prevention spending (individual action).

Like-for-like comparisons where values are similar but policy is different (like Sweden and its neighbours), make it clear that given the underlying values we have, which lead to the behaviours that we have observed this year, the imperative 'prevent covid less' leads to outcomes that are across the board worse.

Or consider Sweden, which had a relatively non-panicky Covid messaging, no matter what you think of their substantive policies.  Sweden didn’t do any better on the gdp front, and the country had pretty typical adverse mobility reactions.  (NB: These are the data that you don’t see the “overreaction” critics engage with — at all.  And there is more where this came from.)

How about Brazil? While they did some local lockdowns, they have a denialist president, a weak overall response, and a population used to a high degree of risk.  The country still saw a gdp plunge and lots of collateral damage.  You might ponder this graph, causality is tricky and the “at what margin” question is trickier yet, but it certainly does not support what Bryan is claiming about the relevant trade-offs.

So, with the firm understanding that given the values we have, and the behaviour patterns we will inevitably adopt, telling people to prevent the pandemic less is worse economically and worse in terms of deaths, we can then ask the further, more abstract question that you ask - what if our values were different? That is, what if the option was available to us because we were actually capable of letting the virus rip.

I wanted to put that disclaimer in because discussing whether we have developed the right societal values is irrelevant for policy decisions going forward - but still important for other reasons. I'd be quite concerned if our value drift over the last century or so was revealed as overall maladapted, but it's important to talk about the fact that this is the question that's at stake when we ask if society is over-preventing covid. I am not asking whether lockdowns or suppression are worth it now - they are.

You seem to think that our values should be different; that it's at least plausible that signalling is leading us astray and causing us to overvalue the direct damage of covid, like lives lost, in place of concern for overall damage. Unlike Robin Hanson, though, you aren't recommending we attempt to tell people to go off and have different values - you're simply noting that you think our tendency to make larger sacrifices is a mistake.

...even when the trade-offs are similar, which ties into my view that simulacra and maze levels are higher, with a larger role played by fear of motive ambiguity. We might have been willing to do challenge trials or other actual experiments, and have had a much better handle on things quicker on many levels.

There are two issues here - one is that it's not at all clear whether the initial cost-benefit calculation about over-prevention is even correct. You don't claim to know if we are over-preventing in this abstract sense (compared to us having different values and individually not avoiding catching the disease), and the evidence that we are over-preventing comes from a twitter poll of Bryan Caplan's extremely libertarian-inclined followers who he told to try as hard as possible to be objective in assessing pandemic costs because he asked them what 'the average American' would value (Come on!!). Tyler Cowen briefly alludes to how woolly the numbers are here, 'I don’t agree with Bryan’s numbers, but the more important point is one of logic'.

The second issue is whether our change in values is an aberration caused by runaway signalling or reflects a legitimate, correct valuation of human life. Now, the fact that a lot of our prevention spending has been wasteful counts in favour of the signalling explanation, but on the other hand there's a ton of evidence that we in the past, in general, valued life too little. [There's also the point that this seems like exactly a case where a signalling explanation is hard to falsify, an issue I talked about here,

I worry that there is a tendency to adopt self-justifying signalling explanations, where an internally complicated signalling explanation that's hard to distinguish from a simpler 'lying' explanation, gets accepted, not because it's a better explanation overall but just because it has a ready answer to any objections. If 'Social cognition has been the main focus of Rationality' is true, then we need to be careful to avoid overusing such explanations. Stefan Schubert explains how this can end up happening:

I think the correct story is that the value shift has been good and bad - valuing human life more strongly has been good, but along with that its become more valuable to credibly fake valuing human life, which has been bad.

Comment by sdm on Commentary on AGI Safety from First Principles · 2020-11-25T16:28:27.402Z · LW · GW

Yeah - this is a case where how exactly the transition goes seems to make a very big difference. If it's a fast transition to a singleton, altering the goals of the initial AI is going to be super influential. But if it's that there are many generations of AIs that over time become the larger majority of the economy, then just control everything - predictably altering how that goes seems a lot harder at least.

Comparing the entirety of the Bostrom/Yudkowsky singleton intelligence explosion scenario to the slower more spread out scenario, it's not clear that it's easier to predictably alter the course of the future in the first compared to the second.

In the first, assuming you successfully set the goals of the singleton, the hard part is over and the future can be steered easily because there are, by definition, no more coordination problems to deal with. But in the first, a superintelligent AGI could explode on us out of nowhere with little warning and a 'randomly rolled utility function', so the amount of coordination we'd need pre-intelligence explosion might be very large.

In the second slower scenario, there are still ways to influence the development of AI - aside from massive global coordination and legislation, there may well be decision points where two developmental paths are comparable in terms of short-term usefulness but one is much better than the other in terms of alignment or the value of the long-term future. 

Stuart Russell's claim that we need to replace 'the standard model' of AI development is one such example - if he's right, a concerted push now by a few researchers could alter how nearly all future AI systems are developed for the better. So different conditions have to be met for it to be possible to predictably alter the future long in advance on the slow transition model (multiple plausible AI development paths that could be universally adopted and have ethically different outcomes) compared to the fast transition model (the ability to anticipate when and where the intelligence explosion will arrive and do all the necessary alignment work in time), but its not obvious to me one is easier to meet than the other.

 

For this reason, I think it's unlikely there will be a very clearly distinct "takeoff period" that warrants special attention compared to surrounding periods.

I think the period AI systems can, at least in aggregate, finally do all the stuff that people can do might be relatively distinct and critical -- but, if progress in different cognitive domains is sufficiently lumpy, this point could be reached well after the point where we intuitively regard lots of AI systems as on the whole "superintelligent."

This might be another case (like 'the AIs utility function') where we should just retire the term as meaningless, but I think that 'takeoff' isn't always a strictly defined interval, especially if we're towards the medium-slow end. The start of the takeoff has a precise meaning only if you believe that RSI is an all-or-nothing property. In this graph from a post of mine, the light blue curve has an obvious start to the takeoff where the gradient discontinuously changes, but what about the yellow line? There clearly is a takeoff in that progress becomes very rapid, but there's no obvious start point, but there is still a period very different from our current period that is reached in a relatively short space of time - so not 'very clearly distinct' but still 'warrants special attention'.

 

At this point I think it's easier to just discard the terminology altogether. For some agents, it's reasonable to describe them as having goals. For others, it isn't. Some of those goals are dangerous. Some aren't. 

Daniel Dennett's Intentional stance is either a good analogy for the problem of "can't define what has a utility function" or just a rewording of the same issue. Dennett's original formulation doesn't discuss different types of AI systems or utility functions, ranging in 'explicit goal directedness' all the way from expected-minmax game players to deep RL to purely random agents, but instead discusses physical systems ranging from thermostats up to humans. Either way, if you agree with Dennett's formulation of the intentional stance I think you'd also agree that it doesn't make much sense to speak of 'the utility function as necessarily well-defined.

Comment by sdm on Covid 11/19: Don’t Do Stupid Things · 2020-11-20T18:42:48.555Z · LW · GW

Much of Europe went into strict lockdown. I was and am still skeptical that they were right to keep schools open, but it was a real attempt that clearly was capable of working, and it seems to be working.

The new American restrictions are not a real attempt, and have no chance of working.

The way I understand it is that 'being effective' is making an efficient choice taking into account asymmetric risk and the value of information, and the long-run trade-offs. This involves things like harsh early lockdowns, throwing endless money at contact tracing, and strict enforcement of isolation. Think Taiwan, South Korea.

Then 'trying' is adopting policies that have a reasonable good chance of working, but not having a plan if they don't work, not erring on the side of caution of taking into account asymmetric risk when you adopt the policies, and not responding to new evidence quickly. The schools thing is a perfect example - closing has costs (makes the lockdown less effective and therefore longer), and it wasn't overwhelmingly clear that schools had to close to turn R under 1, so that was good enough. Partially funding tracing efforts, waiting until there's visibly no other choice and then calling a strict lockdown - that's 'trying'. Think the UK and France.

And then you have 'trying to try', which you explain in detail.

Dolly Parton helped fund the Moderna vaccine. Neat. No idea why anyone needed to do that, but still. Neat.

It's reassuring to know that if the administrative state and the pharmaceutical industry fails, we have Dolly Parton.

Comment by sdm on Some AI research areas and their relevance to existential safety · 2020-11-20T18:22:22.622Z · LW · GW

That said, I remain interested in more clarity on what you see as the biggest risks with these multi/multi approaches that could be addressed with technical research.

A (though not necessarily the most important) reason to think technical research into computational social choice might be useful is that examining specifically the behaviour of RL agents from a computational social choice perspective might alert us to ways in which coordination with future TAI might be similar or different to the existing coordination problems we face.

(i) make direct improvements in the relevant institutions, in a way that anticipates the changes brought about by AI but will most likely not look like AI research, 

It seems premature to say, in advance of actually seeing what such research uncovers, whether the relevant mechanisms and governance improvements are exactly the same as the improvements we need for good governance generally, or different. Suppose examining the behaviour of current RL agents in social dilemmas leads to a general result which in turn leads us to conclude there's a disproportionate chance TAI in the future will coordinate in some damaging way that we can resolve with a particular new regulation. It's always possible to say, solving the single/single alignment problem will prevent anything like that from happening in the first place, but why put all your hopes on plan A, when plan B is relatively neglected?

Comment by sdm on Some AI research areas and their relevance to existential safety · 2020-11-20T18:10:34.056Z · LW · GW

Thanks for this long and very detailed post!

The MARL projects with the greatest potential to help are probably those that find ways to achieve cooperation between decentrally trained agents in a competitive task environment, because of its potential to minimize destructive conflicts between fleets of AI systems that cause collateral damage to humanity.  That said, even this area of research risks making it easier for fleets of machines to cooperate and/or collude at the exclusion of humans, increasing the risk of humans becoming gradually disenfranchised and perhaps replaced entirely by machines that are better and faster at cooperation than humans.

In ARCHES, you mention that just examining the multiagent behaviour of RL systems (or other systems that work as toy/small-scale examples of what future transformative AI might look like) might enable us to get ahead of potential multiagent risks, or at least try to predict how transformative AI might behave in multiagent settings. The way you describe it in ARCHES, the research would be purely exploratory,

One approach to this research area is to continually ex-amine social dilemmas through the lens of whatever is the leading AI devel-opment paradigm in a given year or decade, and attempt to classify interest-ing behaviors as they emerge. This approach might be viewed as analogousto developing “transparency for multi-agent systems”: first develop inter-esting multi-agent systems, and then try to understand them. 

But what you're suggesting in this post, 'those that find ways to achieve cooperation between decentrally trained agents in a competitive task environment', sounds like combining computational social choice research with multiagent RL -  examining the behaviour of RL agents in social dilemmas and trying to design mechanisms that work to produce the kind of behaviour we want. To do that, you'd need insights from social choice theory. There is some existing research on this, but it's sparse and very exploratory.

My current research is attempting to build on the second of these.

As far as I can tell, that's more or less it in terms of examining RL agents in social dilemmas, so there may well be a lot of low-hanging fruit and interesting discoveries to be made. If the research is specifically about finding ways of achieving cooperation in multiagent systems by choosing the correct (e.g. voting) mechanism, is that not also computational social choice research, and therefore of higher priority by your metric?

In short, computational social choice research will be necessary to legitimize and fulfill governance demands for technology companies (automated and human-run companies alike) to ensure AI technologies are beneficial to and controllable by human society.  

...

CSC neglect:

As mentioned above, I think CSC is still far from ready to fulfill governance demands at the ever-increasing speed and scale that will be needed to ensure existential safety in the wake of “the alignment revolution”. 

Comment by sdm on The 300-year journey to the covid vaccine · 2020-11-10T13:03:45.396Z · LW · GW

The remedies for all our diseases will be discovered long after we are dead; and the world will be made a fit place to live in, after the death of most of those by whose exertions it will have been made so. It is to be hoped that those who live in those days will look back with sympathy to their known and unknown benefactors.

— John Stuart Mill, diary entry for 15 April 1854

Comment by sdm on AGI safety from first principles: Goals and Agency · 2020-11-02T18:00:56.649Z · LW · GW

Furthermore, we should take seriously the possibility that superintelligent AGIs might be even less focused than humans are on achieving large-scale goals. We can imagine them possessing final goals which don’t incentivise the pursuit of power, such as deontological goals, or small-scale goals. 

...

My underlying argument is that agency is not just an emergent property of highly intelligent systems, but rather a set of capabilities which need to be developed during training, and which won’t arise without selection for it

Was this line of argument inspired by Ben Garfinkel's objection to the 'classic' formulation of instrumental convergence/orthogonality - that these are 'measure based' arguments that just identify that a majority of possible agents with some agentive properties and large-scale goals will optimize in malign ways, rather than establishing that we're actually likely to build such agents?

It seems like you're identifying the same additional step that Ben identified, and that I argued could be satisfied - that we need a plausible reason why we would build an agentive AI with large-scale goals.

And the same applies for 'instrumental convergence' - the observation that most possible goals, especially simple goals, imply a tendency to produce extreme outcomes when ruthlessly maximised:

  • A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  

We could see this as marking out a potential danger - a large number of possible mind-designs produce very bad outcomes if implemented. The fact that such designs exist 'weakly suggest' (Ben's words) that AGI poses an existential risk since we might build them. If we add in other premises that imply we are likely to (accidentally or deliberately) build such systems, the argument becomes stronger. But usually the classic arguments simply note instrumental convergence and assume we're 'shooting into the dark' in the space of all possible minds, because they take the abstract statement about possible minds to be speaking directly about the physical world. There are specific reasons to think this might occur (e.g. mesa-optimisation, sufficiently fast progress preventing us from course-correcting if there is even a small initial divergence) but those are the reasons that combine with instrumental convergence to produce a concrete risk, and have to be argued for separately.

Comment by sdm on SDM's Shortform · 2020-10-30T17:04:06.134Z · LW · GW

I think that the notion of Simulacra Levels is both useful and important, especially when we incorporate Harry Frankfurt's idea of Bullshit

Harry Frankfurt's On Bullshit seems relevant here. I think its worth trying to incorporate Frankfurt's definition as well, as it is quite widely known, see e.g. this video - If you were to do so, I think you would say that on Frankfurt's definition, Level 1 tells the truth, Level 2 lies, Level 3 bullshits about physical facts but will lie or tell the truth about things in the social realm (e.g. others motives, your own affiliation), and Level 4 always bullshits.

How do we distinguish lying from bullshit? I worry that there is a tendency to adopt self-justifying signalling explanations, where an internally complicated signalling explanation that's hard to distinguish from a simpler 'lying' explanation, gets accepted, not because it's a better explanation overall but just because it has a ready answer to any objections. If 'Social cognition has been the main focus of Rationality' is true, then we need to be careful to avoid overusing such explanations. Stefan Schubert explains how this can end up happening:

...

It seems to me that it’s pretty common that signalling explanations are unsatisfactory. They’re often logically complex, and it’s tricky to identify exactly what evidence is needed to demonstrate them.

And yet even unsatisfactory signalling explanations are often popular, especially with a certain crowd. It feels like you’re removing the scales from our eyes; like you’re letting us see our true selves, warts and all. And I worry that this feels a bit too good to some: that they forget about checking the details of how the signalling explanations are supposed to work. Thus they devise just-so stories, or fall for them.

This sort of signalling paradigm also has an in-built self-defence, in that critics are suspected of hypocrisy or naïveté. They lack the intellectual honesty that you need to see the world for what it really is, the thinking goes

Comment by sdm on "Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} · 2020-10-30T14:28:12.882Z · LW · GW

It may well be a crux - an efficient 'tree search' or a similar goal-directed wrapper around a GPT-based system, that can play a role in real-world open-ended planning (presumably planning for an agent to be effecting outcomes in the real world via its text generation), would have to cover continuous action spaces and possible states containing unknown and shifting sets of possible actions (unlike the discrete and small, relative to the real universe, action space of Go which is perfect for a tree search), running (or approximating running) millions of primitive steps (individual text generations and exchanges) into the future (for long-term planning towards e.g. a multi-decade goal like humans are capable of).

That sounds like a problem that's at least as hard as a language-model 'success probability predictor' GPT-N (probably with reward-modelling help, so it can optimize for a specific goal with its text generation). Though such a system would still be highly transformative, if it was human-level at prediction.

To clarify, this is Transformative not 'Radically Transformative' - transformative like Nuclear Power/Weapons, not like a new Industrial Revolution or an intelligence explosion.

I would expect tree search powered by GPT-6 to be probably pretty agentic.

I could imagine (if you found a domain with a fairly constrained set of actions and states, but involved text prediction somehow) that you could get agentic behaviour out of a tree search like the ones we currently have + GPT-N + an RL wrapper around the GPT-N. That might well be quite transformative - could imagine it being very good for persuasion, for example.

Comment by sdm on Open & Welcome Thread – October 2020 · 2020-10-30T13:45:47.316Z · LW · GW

I don't know Wei Dai's specific reasons for having such a high level of concern, but I suspect that they are similar to the arguments given by the historian Niall Ferguson in this debate with Yascha Mounk on how dangerous 'cancel culture' is. Ferguson likes to try and forecast social and cultural trends years in advance and thinks that he sees a cultural-revolution like trend growing unchecked.

Ferguson doesn't give an upper bound on how bad he thinks things could get, but he thinks 'worse than McCarthyism' is reasonable to expect over the next few years, because he thinks that 'cancel culture' has more broad cultural support and might also gain hard power in institutions.

Now - I am more willing to credit such worries than I was a year ago, but there's a vast gulf between a trend being concerning and expecting another Cultural Revolution. It feels too much like a direct linear extrapolation fallacy - 'things have become worse over the last year, imagine if that keeps on happening for the next six years!' I wasn't expecting a lot of what happened over the last eight months in the US on the 'cancel culture' side, but I think that a huge amount of this is due to a temporary, Trump- and Covid- and Recession-related heating up of the political discourse, not a durable shift in soft power or people's opinions. I think the opinion polls back this up. If I'm right that this will all cool down, we'll know in another year or so.

I also think that Yascha's arguments in that debate about the need for hard institutional power that's relatively unchecked, to get a Cultural-Revolution like outcome, are really worth considering. I don't see any realistic path to that level of hard, governmental power at enough levels being held by any group in the US.

Comment by sdm on "Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} · 2020-10-29T20:44:59.458Z · LW · GW

I think that it could plausibly be quite transformative in a TAI sense and occur over the next ten years, so perhaps we don't have all that much of a disagreement on that point. I also think (just because we don't have an especially clear idea of  how modular intelligence is) that it could be quite uniform and a text predictor could surprise us with humanlike planning.

Maybe the text predictor by itself wouldn't be an agent, but the text predictor could be re-trained as an agent fairly easily, or combined into a larger system that uses tree search or something and thus is an agent. 

This maybe reflects a difference in intuition about how difficult agentive behaviour is to reach rather than language understanding. I would expect a simple tree search algorithm powered by GPT-6 to be... a model with humanlike language comprehension and incredibly dumb agentive behaviour, and that it wouldn't be able to leverage the 'intelligence' of the language model in any significant way, because I see that as a seperate problem requiring seperate, difficult work. But I could be wrong.

I think there is a potential bias in that human-like language understanding and agentive behaviour have always gone together in human beings - we have no idea what a human-level language model that wasn't human-level intelligent would be like. Since we can't imagine it, we tend to default to imagining a human-in-a-box. I'm trying to correct for this bias by imagining that it might be quite different.

Comment by sdm on Covid Covid Covid Covid Covid 10/29: All We Ever Talk About · 2020-10-29T20:18:28.926Z · LW · GW

If you are keeping schools open in light of the graphs above, and think you are not giving up, I don’t even know how to respond.

I think the French lockdown probably won't work without school closures, and this probably will be noticed soon when the data comes through establishing that it doesn't work, and I think that it's extremely dumb to not close schools given that the risk for closing vs not closing at this point is extremely asymmetric, but this isn't 'giving up' knowingly (and I infer that you're suggesting Macron may be trying to show that he is trying while actually giving up) - this is simply Macron and his cabinet not intuitively understanding asymmetric risk and not realizing that it's much better to do far more than what was sufficient, compared to doing something that just stands an okay chance of being sufficient to suppress, in order to avoid costs later.

I think that there is a current tendency - and I see it in some of your statements about the beliefs of the 'doom patrol' - to use signalling explanations almost everywhere, and sometimes that shades into accepting a lower burden of proof, even if the explanation doesn't quite fit. For example, the European experience over the summer is mostly a story of a hideous but predictable failure to understand the asymmetric risk and costs of opening up / investing more vs less in tracing, testing and enforcement.

Signalling plays a role in explaining this irrationality, certainly, but as I explained in last week's comment wedging everything into a box of 'signalling explanations' doesn't always work. Maybe it makes more sense in the US, where the coronavirus response has been much more politicised. Stefan Schubert has a great blog post on this tendency:

It seems to me that it’s pretty common that signalling explanations are unsatisfactory. They’re often logically complex, and it’s tricky to identify exactly what evidence is needed to demonstrate them.

And yet even unsatisfactory signalling explanations are often popular, especially with a certain crowd. It feels like you’re removing the scales from our eyes; like you’re letting us see our true selves, warts and all. And I worry that this feels a bit too good to some: that they forget about checking the details of how the signalling explanations are supposed to work. Thus they devise just-so stories, or fall for them.

This sort of signalling paradigm also has an in-built self-defence, in that critics are suspected of hypocrisy or naïveté. They lack the intellectual honesty that you need to see the world for what it really is, the thinking goes

I think that a few of your explanations fall into this category.

They’re pushing the line that even after both of you have an effective vaccine you still need to socially distance.

Isn't this... true? Given that an effective vaccine will take time to distribute (best guess 25 million doses by early next spring), that there will be a long period where we're approaching herd immunity and the risk is steadily decreasing as more people become immune, Fauci is probably worried about people risk compensating during this interval, so he's trying to emphasise that a vaccine won't be perfectly protective and might take a while, maybe exaggerating both claims, while not outright lying. I agree that this type of thinking can shade into doom-mongering and sometimes outright lying about how long vaccines might take but this seems like solidly consequentialist lying to promote social distancing (SL 2), not bullshitting (SL 3). Maybe they've gotten the behavioural response wrong, and it's much better go be truthful, clear and give people reasonable hope (I think it is), but that's a difference in strategy, not pure SL3 bullshit. Why are you so confident that it's the latter?

I don’t think this is something being said in order to influence behavior, or even to influence beliefs. That is not the mindset we are dealing with at this point. It’s not about truth. It’s not about consequentialism. We have left not only simulacra level 1 but also simulacra level 2 fully behind. It’s about systems that instinctively and continuously pull in the direction of more fear, more doom, more warnings, because that is what is rewarded and high status and respectable and serious and so on, whereas giving people hope of any kind is the opposite. That’s all this is.

That's a bold claim to make about someone with a history like Fauci's, and since 'the priority with first vaccinations is to prevent symptoms and preventing infection is a bonus' is actually true, if misleading, I don't think it's warranted.

This just sounds exactly like generic public health messaging aimed at getting people to wear masks now by making them not focus on the prospect of a vaccine. Plus it might even be important to know, especially when you consider that vaccination will happen slowly and Fauci doesn't want people to risk compensate after some people around them have been vaccinated but they haven't been. I don't think Fauci is thinking beyond saying whatever he needs to say to drive up mask compliance right now, which is SL 2. Your explanation that Dr Fauci has lost track of whether or not vaccines actually prevent infection might be true - but it strikes me as weird and confusing, something you'd expect of a more visibly disordered person, and the kind of thing you'd need more evidence of than what he said in that little clip. I think those explanations absolutely have their place, especially for explaining some horrible public health messaging by some politicians and public-facing experts and most of the media, but I think this particular example is overuse of signalling explanations in the way argued for in the article I linked above. At the very least I think the SL2 consequentalist lying explanation is simpler and has a plausible story behind it, so I don't know why you'd go for the less clear SL3 explanation with apparent certainty.

Essentially, Europe chose to declare victory and leave home without eradication, and the problem returned, first slowly, now all at once, as it was bound to do without precautions.

We did take plenty of precautions, they were just wholly inadequate relative to the potential damage of a second wave. A lot of this was not understanding the asymmetric risk. Most of Europe had precautions that might work and testing and tracing systems that were catching some of the infected and various shifting rules about social distancing and it was at least unclear if they would be sufficient. I can't speak about other countries, but people in the UK were intellectually extremely nervous about the reopening and most people consistently polled saying it was too soon to reopen. For a while it worked - including in July when there was a brief increase in the UK that was reversed successfully. The number of people I see around me wearing masks has been increasing steadily ever since the start of the pandemic. So it was easy and convenient to say, 'it's a risk worth taking, it's worked out so far' at least for a while - even though any sane calculation of the risks should have said we ought to have invested vastly more than we did in testing, tracing, enforcement, supported isolation etc. even if things looked like they were under control.

Not that giving up is obviously the wrong thing to do! But that does not seem to be Macron’s plan.

...

We are going to lock you down if you misbehave, so if you misbehave all you’re doing is locking yourself down. She’s right, of course, that things will keep getting worse until we change the trajectory and make them start getting better, but no the interventions to regain control are exactly the same either way. You either get R below 1, or you don’t. Except that the more it got out of control first, the more voluntary adjustments you’ll see, and the more people will be immune, so the more out of control it gets the easier it is to control later. ...

And also the longer you wait, the longer you have to spend with stricter measures.

The measures don't need to be stricter unless you can't tolerate as long with high infection rates, in which case you need infection rates to go down much faster. I don't know if makes me and Tyler Cowen and most epidemiologists part of the 'doom patrol' if we say that you'll need a longer interval of either voluntary behaviour change to avoid infection or a longer lockdown the more you wait.

(Note that I'm not denying that there are such doomers. Some of the things you mention, like people explicitly denying coronavirus treatment has made the disease less deadly and left hospitals much better able to cope, aren't really things in Europe or the UK and I was amazed to learn people in the US are claiming things that insane, but we have our own fools demanding pointless sacrifices - witness the recent ban Wales put on buying 'nonessential goods' within supermarkets)

If by 'giving up' you mean 'not changing the government mandated measures currently on offer to be more like a lockdown', given the situation France is in right now, it seems undeniably the wrong thing to do to rely on voluntary behaviour changes and hope that there's no spike that overwhelms hospitals (again, asymmetric risk!) - worse for the economy, lives and certainly for other knock-on effects like hospital overloading. A lot of estimations of the marginal cost of suppression measures completely miss the point that the costs and benefits just don't separate out neatly, as I argue here. Tyler Cowen:

I think back to when I was 12 or 13, and asked to play the Avalon Hill board game Blitzkrieg.  Now, as the name might indicate, you win Blitzkrieg by being very aggressive.  My first real game was with a guy named Tim Rice, at the Westwood Chess Club, and he just crushed me, literally blitzing me off the board.  I had made the mistake of approaching Blitzkrieg like chess, setting up my forces for various future careful maneuvers.  I was back on my heels before I knew what had happened.

Due to its potential for exponential growth, Covid-19 is more like Blitzkrieg than it is like chess.  You are either winning or losing (badly), and you would prefer to be winning.  A good response is about trying to leap over into that winning space, and then staying there.  If you find that current prevention is failing a cost-benefit test, that doesn’t mean the answer is less prevention, which might fail a cost-benefit test all the more, due to the power of the non-local virus multiplication properties to shut down your economy and also take lives and instill fear.

You still need to come up with a way of beating Covid back.

'Giving up' is not actually giving up. At least in Europe, given the state of public behaviour and opinion about the virus, 'giving up' just means Sweden's 'voluntary suppression' in practice. There is no outcome where we uniformly line up to variolate ourselves and smoothly approach herd immunity. The people who try to work out the costs and benefits of 'lockdowns' are making a meaningless false comparison between 'normal economy' and 'lockdown:

First and foremost, the declaration does not present the most important point right now, which is to say October 2020: By the middle of next year, and quite possibly sooner, the world will be in a much better position to combat Covid-19. The arrival of some mix of vaccines and therapeutics will improve the situation, so it makes sense to shift cases and infection risks into the future while being somewhat protective now. To allow large numbers of people today to die of Covid, in wealthy countries, is akin to charging the hill and taking casualties two days before the end of World War I.

...

What exactly does the word “allow” mean in this context? Again the passivity is evident, as if humans should just line up in the proper order of virus exposure and submit to nature’s will. How about instead we channel our inner Ayn Rand and stress the role of human agency? Something like: “Herd immunity will come from a combination of exposure to the virus through natural infection and the widespread use of vaccines. Here are some ways to maximize the role of vaccines in that process.”>In that sense, as things stand, there is no “normal” to be found. An attempt to pursue it would most likely lead to panic over the numbers of cases and hospitalizations, and would almost certainly make a second lockdown more likely. There is no ideal of liberty at the end of the tunnel here.

In Europe, we will have more lockdowns. I'm not making the claim that this is what we should do, or that this is what's best for the economy given the dreadful situation we've landed ourselves in, or that's what we'll almost certainly end up doing given political realities - though I think these are all true. What I'm saying is that, whether (almost certainly) by governments caving to political pressure or (if they hold out endlessly like Sweden) by voluntary behaviour change, we'll shut down the economy in an attempt to avoid catching the virus. Anything else is inconceivable and requires lemming-like behaviour from politicians and ordinary people.

So, given that it's going to happen, would you rather it be chaotic and late and uncoordinated, or sharper and earlier and hopefully shorter? If we're talking about government policy, there really isn't all that much compromise on the marginal costs of lockdowns vs the economy to be had if you're currently in the middle of a sufficiently rapid acceleration.

Comment by sdm on "Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} · 2020-10-29T18:58:56.093Z · LW · GW

I'm still a bit puzzled by the link between human level on text prediction and 'human level' unconditionally - if I recall our near-bet during the forecasting tournament, our major disagreement was on whether direct scaling of GPT like systems takes us near to AGI. I often think that (because we don't have direct experience with any verbal intelligences in capability between GPT-3 and human brains) we're often impoverished when trying to think about such intelligences. I imagine that a GPT-6 that is almost 'human level on text prediction' could still be extremely deficient in other areas - it would be very weird to converse with, maybe like an amnesiac or confabulator that's very articulate and with good short-term memory.

If language models scale to near-human performance but the other milestones don't fall in the process, and my initial claim is right, that gives us very transformative AI but not AGI. I think that the situation would look something like this:

If GPT-N reaches par-human:

discovering new action sets

managing its own mental activity

(?) cumulative learning

 

human-like language comprehension

perception and object recognition

efficient search over known facts

So there would be 2 (maybe 3?) breakthroughs remaining. It seems like you think just scaling up a GPT will also resolve those other milestones, rather than just giving us human-like language comprehension. Whereas if I'm right and also those curves do extrapolate, what we would get at the end would be an excellent text generator, but it wouldn't be an agent, wouldn't be capable of long-term planning and couldn't be accurately described as having a utility function over the states of the external world, and I don't see any reason why trivial extensions of GPT would be able to do that either since those seem like problems that are just as hard as human-like language comprehension. GPT seems like it's also making some progress on cumulative learning, though it might need some RL-based help with that, but none at all on managing mental activity for longterm planning or discovering new action sets.

Comment by sdm on Security Mindset and Takeoff Speeds · 2020-10-29T18:47:16.938Z · LW · GW

In terms of inferences about deceptive alignment, it might be useful to go back to the one and only current example we have where someone with somewhat relevant knowledge was led to wonder whether deception had taken place - GPT-3 balancing brackets. I don't know if anyone ever got Eliezer's $1000 bounty, but the top-level comment on that thread at least convinces me that it's unlikely that GPT-3 via AI Dungeon was being deceptive even though Eliezer thought there was a real possibility that it was.

Now, this doesn't prove all that much, but one thing it does suggest is that on current MIRI-like views about how likely deception is, the threshold for uncertainty about deception is set far too low. That suggests your people at OpenSoft might well be right in their assumption.

Comment by sdm on Have the lockdowns been worth it? · 2020-10-20T13:45:48.275Z · LW · GW
  • How long do we expect to have to wait for a vaccine or much more effective treatment? 

I can't think of a better source on this than the Good Judgment project's COVID-19 recovery dashboard.

  • How does the economic and related damage vary for voluntary vs involuntary suppression?

This is incredibly complicated and country-specific and dependent on all sorts of factors but maybe this graph from the Financial Times is a good place to start, it tells us how things have gone so far.

  • How does the total number and spread of infections vary for voluntary vs involuntary suppression? 

This is even harder than the previous question. 'All we can say for sure is "It was possible to get R<1 in Sweden in the spring with less stringent measures'. If you consider that Sweden suffered considerably more death than its comparable neighbours, then you can project that the initial surge in deaths in badly-hit locked down countries like the UK could have been much higher with voluntary measures, but how much higher is difficult to assess. I think that between-country comparisons are almost useless in these situations.

This is also where accounting for coronavirus deaths and debilitations comes into play. 'Anti-lockdown' arguments sometimes focus on the fact that even in badly-hit countries, the excess death figures have been in the rough range of +10%, (though with around 11 years of life lost). There are ways of describing this that make it seem 'not so bad' or 'not worth shutting the country down for', by e.g. comparing it to deaths from the other leading causes of death, like heart disease. This assumes there's a direct tradeoff where we can 'carry on as normal' while accepting those deaths and avoid the economic damage, but there is no such tradeoff to be made. There's just the choice as to which way you place the additional nudges of law and public messaging on top of a trajectory you're largely committed to by individual behaviour changes.

And if you do try to make the impossible, hypothetical 'tradeoff economy and lives' comparison between 'normal behaviour no matter what' and virus suppression, then the number of excess deaths to use for comparison isn't the number we in fact suffered, but far higher, given the IFR of 0.5-1%, it's on the order of +100% excess deaths (600,000 in the UK and 2 million in the US).

But again, such a comparison isn't useful, as it's not a policy that could be enacted or adopted, in fact it would probably require huge state coercion to force people to return to 'normal life'.

The basic point that it wouldn't be worth sacrificing everything to reduce excess deaths by 10% and save a million life-years is true, but that point is turned into a motte-and-bailey, where the motte is that there exists a level of damage at which a particular suppression measure (full lockdowns) is no longer worth it, and the bailey is that in all the situations we are in now most suppression measures are not worth it.

  • To what degree do weaker legally mandated measures earlier spare us from stronger legally mandated measures (or greater economic damage from voluntary behaviour change) later?

This raises the difficult question of how much to take into account panic over overwhelmed hospitals and rising cases. Tyler Cowen:

In that sense, as things stand, there is no “normal” to be found. An attempt to pursue it would most likely lead to panic over the numbers of cases and hospitalizations, and would almost certainly make a second lockdown more likely.

Comment by sdm on The Treacherous Path to Rationality · 2020-10-19T17:07:22.990Z · LW · GW

The Rationality community was never particularly focused on medicine or epidemiology. And yet, we basically got everything about COVID-19 right and did so months ahead of the majority of government officials, journalists, and supposed experts.

...

We started discussing the virus and raising the alarm in private back in January. By late February, as American health officials were almost unanimously downplaying the threat, we wrote posts on taking the disease seriously, buying masks, and preparing for quarantine.

...

The rationalists pwned COVID

This isn't true. We did see it coming more clearly than most of the governmental authorities and certainly were ahead of public risk communication, but we were on average fairly similar or even a bit behind the actual domain experts.

This article summarizes interviews with epidemiologists on when they first realized COVID-19 was going to be a huge catastrophe and how they reacted. The dates range from January 15th with the majority in mid-late February. See also this tweet from late February, from a modeller working of the UK's SAGE, confirming he thinks uncontrolled spread is taking place.

I have an email dated 27 Feb 2020 replying to a colleague: "My thoughts on Covid-19 - pandemic is very likely." It was such a dry, intellectual statement, and I remember feeling incredulous that I could write those words with such ease and certainty while feeling total uncertainty and fear about how this could play out.

...

Two moments stand out for me. One was in the first week of February, when I saw early signals that there could be substantial transmission before people show symptoms. Despite hopes of rapid containment, it was clear contact tracing alone would struggle to contain outbreaks

...

On January 23, I was at an NIH meeting related to potential pandemic pathogen research. Everyone had heard the news [that Wuhan had been locked down] and was beginning to discuss whether this would be a big deal. Over the next several weeks the concern turned more grave.

I believe February 27th was the same day as 'Seeing the Smoke', when it became accepted wisdom around here that coronavirus would be a huge catastrophe. Feb 27th was a day before I said I thought this would be a test-run for existential risk. And late January, we were in the same position as the NIH of 'beginning to discuss whether this would be a big deal' without certainty. The crucial difference was understanding the asymmetric risk - A failure, but not of prediction.

So why didn't the domain experts do anything if so? I've been reading the book Rage by Bob Woodward which includes interviews with Fauci and other US officials from January and February. There was a constant emphasis on how demanding strict measures early would be 'useless' and achieve nothing from as early as the end of December!

I'm growing to think that a lot of health experts had an implicit understanding that the systems around them in the west were not equipped to carry out their best plans of action. In other words, they saw the smoke under the door, decided that if they yelled 'fire' before it had filled up the room nobody would believe them and then decided to wait a bit before yelling 'fire'. But since we weren't trying to produce government policy, we weren't subject to the same limitations.

Comment by sdm on Have the lockdowns been worth it? · 2020-10-19T16:43:04.491Z · LW · GW

An important consideration is that the 'thing that the US, UK and China have been doing, and what Sweden didn’t', may not refer to anything. There are two meanings of 'lockdowns have not been worth it' - 'allow the natural herd immunity to happen and carry on as normal, accepting the direct health damage while saving the economy' or 'we shouldn't adopt legally mandatory measures to attempt to suppress the virus and instead adopt voluntary measures to attempt to suppress the virus'. The latter of these is the only correct way to interpret 'thing Sweden did that the other countries didn't'. The first of these is basically a thought-experiment, not a possible state of affairs, because people won't carry on as usual. So it can't be used for cost-benefit comparisons.

In terms of behaviour, there is far more similarity between what the US and Sweden 'did' than what the US and China 'did'. Tyler Cowen has written several articles emphasising exactly this point. What Sweden 'did' was an uncoordinated, voluntary attempt at the same policy that China, Germany, the UK and the US attempted with varying levels of seriousness - social distancing to reduce the R effectively below 1, suppressing the epidemic. This thread summarizes the 'voluntary suppression' that countries like Sweden ended up with. Tyler Cowen writes an article attempting to 'right the wrong question':

"The most compassionate approach that balances the risks and benefits of reaching herd immunity, is to allow those who are at minimal risk of death to live their lives normally to build up immunity to the virus through natural infection, while better protecting those who are at highest risk. We call this Focused Protection."

What exactly does the word “allow” mean in this context? Again the passivity is evident, as if humans should just line up in the proper order of virus exposure and submit to nature’s will. How about instead we channel our inner Ayn Rand and stress the role of human agency? Something like: “Herd immunity will come from a combination of exposure to the virus through natural infection and the widespread use of vaccines. Here are some ways to maximize the role of vaccines in that process.”

So, the question cannot be "should we allow the natural herd immunity to happen and carry on as normal, accepting the direct health damage while protecting the economy" - that is not actually a possible state of affairs given human behaviour. We can ask whether a better overall outcome is achieved with legally required measures to attempt suppression, rather than an uncoordinated attempt at suppression, but since people will not carry on as normal we can't ask 'has the economic/knock-on cost of lockdowns been worth the lives saved' without being very clear that the counterfactual may not be all that different.

The most important considerations have to be,

  • How long do we expect to have to wait for a vaccine or much more effective treatment? If not long, then any weaker suppression is 'akin to charging the hill and taking casualties two days before the end of World War I'. If a long time, then we must recognise that in e.g. the US given that a slow grind up to herd immunity through infection will eventually occur.
  • How does the economic and related damage vary for voluntary vs involuntary suppression? The example of Sweden compared to its neighbours is illustrative here.
  • How does the total number and spread of infections vary for voluntary vs involuntary suppression? You can't rerun history for a given country with vs without legally mandated suppression measures.
  • To what degree do weaker legally mandated measures earlier spare us from stronger legally mandated measures (or greater economic damage from voluntary behaviour change) later?
  • Edit: Tyler Cowen released another article arguing for a new consideration that I didn't list - what reference class to place Coronavirus in - 'external attack on the nation' or 'regular cause of death'. Since, for fairly clear rule-utilitarian/deontological reasons, governments should care more about defending their citizens from e.g. wars and terrorist attacks compared to random accidents that kill similar numbers of people. I also think this is a key disagreement between pro/anti-'lockdown' positions.

To emphasise this last point, although it falls under 'questioning the question', the focus on Lockdowns can be counterproductive when there are vastly more cost-effective measures that could have been attempted by countries like the UK that had very low caseloads through the summer - like funding enforcement and support for isolation and better contact tracing, mask enforcement, and keeping events outdoors. These may fall under some people's definition of 'lockdown' since some of them are legally mandatory social distancing, but their costs and benefits are wildly different from stay-at-home orders. Scepticism of 'Lockdowns' must be defined to be more specific.

Comment by sdm on Covid 10/15: Playtime is Over · 2020-10-17T15:13:25.876Z · LW · GW

The other group claims their goal is to save lives while preventing economic disaster. In practice, they act as if their goal was to destroy as much economic and social value as possible in the name of the pandemic as a Sacrifice to the Gods, and to pile maximum blame upon those who do not go along with this plan, while doing their best to slow down or block solutions that might solve the pandemic without sufficiently destroying economic or social value.

There are less cynical ways to view countermeasures that go too far. I'd compare it, especially early on, to many of us developing mild OCD because of how terrifying things were - compliance was also very high early on.

they act as if their goal was to destroy as much economic and social value as possible in the name of the pandemic as a Sacrifice to the Gods

...

they act as if their goal was to have everyone ignore the pandemic, actively flouting all precautions

A lot of the response in Europe/UK has not looked like this, or like your opposite side but it still hasn't been very good.

The UK/Europe response been more like an inefficient, clumsy attempt to strike a 'balance' between mitigation and saving the economy, while showing no understanding of how to make good tradeoffs - e.g opening the universities while banning small gatherings. It looks more like an attempt to do all the 'good' things at once for the economy and health and get the reputational/mood affiliation benefits from both. E.g. in the UK in summer we half-funded the tracing and isolation infrastructure, ignored that compliance was low and gave subsides to people eating out at pubs and restaurants after suppressing the virus hard and at great cost, and now might be employing incredibly costly lockdown measures again when we could have fully squashed with a bit of extra effort in the summer when numbers were almost zero - and that's the same story as most of Europe.

That's more a failure to understand/respond to opportunity costs than either of the failures you describe, though it has aspects of both. It doesn't look like they were acting with the goal of getting people to adhere to the costliest measures possible, though - witness the reluctance to reimpose restrictions now.

The pandemic has enough physical-world, simulacra-level-1 impact on people to steer most ordinary people’s individual physical actions towards what seems to them like useful ones that preserve economic and social value while minimizing health risks. And it manages to impose some **amount of similar restrictions on the collective and rhetorical actions. **

This is the part that I like to emphasise, and the reason that we're still bound for a better outcome than most March predictions implied is because of a decent level of public awareness of risk imposing a brake on the very worst outcomes - the Morituri Nolumus Mori. Many of us didn't properly anticipate how much physical reality would end up hemming in our actions, as I explained in that post.

That doesn’t mean equivalence between sides, let alone equivalence of individuals. But until the basic dynamics are understood, one can’t reasonably predict what will happen next.

This is also worth emphasising. In general, though not in the examples you mention from e.g. California, going too hard works better than going too soft because there just is no pure 'let it rip' option - there's a choice between coordinated and uncoordinated suppression. It looks like voluntary behaviour has (in Europe and the US) mattered relatively more than expected. Countries that relied on voluntary behaviour change like Sweden didn't have the feared uncontrolled spread but also didn't do that well - they ended up with a policy of effective ‘voluntary suppression’ with a slightly different tradeoff – economic damage slightly less than others, activity reduction slower and more chaotic, more deaths. This was essentially a collective choice by the Swedish people despite their government.

that’s probably not true, and probably not true sooner rather than later. Immunity and testing continue to increase, our treatments continue to improve, and vaccines are probably on their way on a timescale of months. Despite the best efforts of both camps, it would greatly surprise me if we are not past the halfway point.

The initial estimates said that 40-50% infected is a reasonable lower bound for when weak mitigation plus partial herd immunity would end the pandemic naturally. I think that's still true. So, it would all have been 'worth it', in pure death terms, if significantly fewer than that many people end up catching coronavirus before much better treatments or vaccines end the epidemic by other means. Last time I checked that's still likely.

Comment by sdm on A voting theory primer for rationalists · 2020-08-31T16:51:54.777Z · LW · GW
You seem to be comparing Arrow's theorem to Lord Vetinari, implying that both are undisputed sovereigns?

It was a joke about how if you take Arrow's theorem literally, the fairest 'voting method' (at least among ranked voting methods), the only rule which produces a definite transitive preference ranking and which meets the unanimity and independence conditions is 'one man, one vote', i.e. dictatorship.

And frankly, I think that the model used in the paper bears very little relationship to any political reality I know of. I've never seen a group of voters who believe "I would love it if any two of these three laws pass, but I would hate it if all three of them passed or none of them passed" for any set of laws that are seriously proposed and argued-for.

Such a situation doesn't seem all that far-fetched to me - suppose there are three different stimulus bills on offer, and you want some stimulus spending but you also care about rising national debt. You might not care which bills pass, but you still want some stimulus money, but you also don't want all of them to pass because you think the debt would rise too high, so maybe you decide that you just want any 2 out of 3 of them to pass. But I think the methods introduced in that paper might be most useful not to model the outcomes of voting systems, but for attempts to align an AI to multiple people's preferences.

Comment by sdm on Forecasting Thread: AI Timelines · 2020-08-29T11:11:04.240Z · LW · GW

I'll take that bet! If I do lose, I'll be far too excited/terrified/dead to worry in any case.

Comment by sdm on Covid 8/27: The Fall of the CDC · 2020-08-28T11:32:12.947Z · LW · GW
I’m still periodically scared in an existential or civilization-is-collapsing-in-general kind of way, but not in a ‘the economy is about to collapse’ or ‘millions of Americans are about to die’ kind of way. 
I’m not sure whether this is progress.

It definitely is progress. If we were in the latter situation, there would be nothing at all to do except hope you personally don't die, whereas in the former there's a chance for things to get better - if we learn the lesson.

By strange coincidence, it's exactly 6 months since I wrote this, and I think it's important to remember just how dire the subjective future seemed at the end of February - that (subjectively, anyway) could have happened, but didn't.

Comment by sdm on SDM's Shortform · 2020-08-28T10:50:18.165Z · LW · GW
The tl;dr is that instead of thinking of ethics as a single unified domain where "population ethics" is just a straightforward extension of "normal ethics," you split "ethics" into a bunch of different subcategories:
Preference utilitarianism as an underdetermined but universal morality
"What is my life goal?" as the existentialist question we have to answer for why we get up in the morning
"What's a particularly moral or altruistic thing to do with the future lightcone?" as an optional subquestion of "What is my life goal?" – of interest to people who want to make their life goals particularly altruistically meaningful

This is very interesting - I recall from our earlier conversation that you said you might expect some areas of agreement, just not on axiology:

(I say elements because realism is not all-or-nothing - there could be an objective 'core' to ethics, maybe axiology, and much ethics could be built on top of such a realist core - that even seems like the most natural reading of the evidence, if the evidence is that there is convergence only on a limited subset of questions.)

I also agree with that, except that I think axiology is the one place where I'm most confident that there's no convergence. :)
Maybe my anti-realism is best described as "some moral facts exist (in a weak sense as far as other realist proposals go), but morality is underdetermined."

This may seem like an odd question, but, are you possibly a normative realist, just not a full-fledged moral realist? What I didn't say in that bracket was that 'maybe axiology' wasn't my only guess about what the objective, normative facts at the core of ethics could be.

Following Singer in the expanding circle, I also think that some impartiality rule that leads to preference utilitarianism, maybe analogous to the anonymity rule in social choice, could be one of the normatively correct rules that ethics has to follow, but that if convergence among ethical views doesn't occur the final answer might be underdetermined. This seems to be exactly the same as your view, so maybe we disagree less than it initially seemed.


In my attempted classification (of whether you accept convergence and/or irreducible normativity), I think you'd be somewhere between 1 and 3. I did say that those views might be on a spectrum depending on which areas of Normativity overall you accept, but I didn't consider splitting up ethics into specific subdomains, each of which might have convergence or not:

Depending on which of the arguments you accept, there are four basic options. These are extremes of a spectrum, as while the Normativity argument is all-or-nothing, the Convergence argument can come by degrees for different types of normative claims (epistemic, practical and moral)

Assuming that it is possible to cleanly separate population ethics from 'preference utilitarianism', it is consistent, though quite counterintuitive, to demand reflective coherence in our non-population ethical views but allow whatever we want in population ethics (this would be view 1 for most ethics but view 3 for population ethics).

(This still strikes me as exactly what we'd expect to see halfway to reaching convergence - the weirder and newer subdomain of ethics still has no agreement, while we have reached greater agreement on questions we've been working on for longer.)

It sounds like your contrasting my statement from The Case for SFE ("fit all one’s moral intuitions into an overarching theory based solely on intuitively appealing axioms") with "arbitrarily halting the search for coherence" / giving up on ethics playing a role in decision-making. But those are not the only two options: You can have some universal moral principles, but leave a lot of population ethics underdetermined.

Your case for SFE was intended to defend a view of population ethics - that there is an asymmetry between suffering and happiness. If we've decided that 'population ethics' is to remain undetermined, that is we adopt view 3 for population ethics, what is your argument (that SFE is an intuitively appealing explanation for many of our moral intuitions) meant to achieve? Can't I simply declare that my intuitions say different, and then we have nothing more to discuss, if we already know we're going to leave population ethics undetermined?

Comment by sdm on Forecasting Thread: AI Timelines · 2020-08-26T14:35:28.173Z · LW · GW

The 'progress will be continuous' argument, to apply to our near future, does depend on my other assumptions - mainly that the breakthroughs on that list are separable, so agentive behaviour and long-term planning won't drop out of a larger GPT by themselves and can't be considered part of just 'improving up language model accuracy'.

We currently have partial progress on human-level language comprehension, a bit on cumulative learning, but near zero on managing mental activity for long term planning, so if we were to suddenly reach human level on long-term planning in the next 5 years, that would probably involve a discontinuity, which I don't think is very likely for the reasons given here.

If language models scale to near-human performance but the other milestones don't fall in the process, and my initial claim is right, that gives us very transformative AI but not AGI. I think that the situation would look something like this:

If GPT-N reaches par-human:

discovering new action sets
managing its own mental activity
(?) cumulative learning
human-like language comprehension
perception and object recognition
efficient search over known facts

So there would be 2 (maybe 3?) breakthroughs remaining. It seems like you think just scaling up a GPT will also resolve those other milestones, rather than just giving us human-like language comprehension. Whereas if I'm right and also those curves do extrapolate, what we would get at the end would be an excellent text generator, but it wouldn't be an agent, wouldn't be capable of long-term planning and couldn't be accurately described as having a utility function over the states of the external world, and I don't see any reason why trivial extensions of GPT would be able to do that either since those seem like problems that are just as hard as human-like language comprehension. GPT seems like it's also making some progress on cumulative learning, though it might need some RL-based help with that, but none at all on managing mental activity for longterm planning or discovering new action sets.

As an additional argument, admittedly from authority - Stuart Russell also clearly sees human-like language comprehension as only one of several really hard and independent problems that need to be solved.

A humanlike GPT-N would certainly be a huge leap into a realm of AI we don't know much about, so we could be surprised and discover that agentive behaviour and having a utility function over states of the external world spontaneously appears in a good enough language model, but that argument has to be made, and you need that argument to hold and GPT to keep scaling for us to reach AGI in the next five years, and I don't see the conjunction of those two as that likely - it seems as though your argument rests solely on whether GPT scales or not, when there's also this other conceptual premise that's much harder to justify.

I'm also not sure if I've seen anyone make the argument that GPT-N will also give us these specific breakthroughs - but if you have reasons that GPT scaling would solve all the remaining barriers to AGI, I'd be interested to hear it. Note that this isn't the same as just pointing out how impressive the results scaling up GPT could be - Gwern's piece here, for example, seems to be arguing for a scenario more like what I've envisaged, where GPT-N ends up a key piece of some future AGI but just provides some of the background 'world model':

Models like GPT-3 suggest that large unsupervised models will be vital components of future DL systems, as they can be ‘plugged into’ systems to immediately provide understanding of the world, humans, natural language, and reasoning.

If GPT does scale, and we get human-like language comprehension in 2025, that will mean we're moving up that list much faster, and in turn suggests that there might not be a large number of additional discoveries required to make the other breakthroughs, which in turn suggests they might also occur within the Deep Learning paradigm, and relatively soon. I think that if this happens, there's a reasonable chance that when we do build an AGI a big part of its internals looks like a GPT, as gwern suggested, but by then we're already long past simply scaling up existing systems.

Alternatively, perhaps you're not including agentive behaviour in your definition of AGI - a par-human text generator for most tasks that isn't capable of discovering new action sets or managing its mental activity is, I think a 'mere' transformative AI and not a genuine AGI.

Comment by sdm on SDM's Shortform · 2020-08-25T15:56:57.852Z · LW · GW

So to sum up, a very high-level summary of the steps in this method of preference elicitation and aggregation would be:

    1. With a mixture of normative assumptions and multi-channel information (approval and actions) as inputs, use a reward-modelling method to elicit the debiased preferences of many individuals.
      1. Determining whether there actually are significant differences between stated and revealed preferences when performing reward modelling is the first step to using multi-channel information to effectively separate biases from preferences.
    2. Create 'proxy agents' using the reward model developed for each human (this step is where intent-aligned amplification can potentially occur).
    3. Place the proxies in an iterated voting situation which tends to produce sensible convergent results. The use of RL proxies here can be compared to the use of human proxies in liquid democracy.
      1. Which voting mechanisms tend to work in iterated situations with RL agents can be determined in other experiments (probably with purely artificial agents)
    4. Run the voting mechanism until an unambiguous winner is decided, using methods like those given in this paper.

This seems like a reasonable procedure for extending a method that is aligned to one human's preferences (step 1,2) to produce sensible results when trying to align to an aggregate of human preferences (step 3,4). It reduces reliance on the specific features of one voting method, Other than the insight that multiple channels of information might help, all the standard unsolved problems with preference learning from one human remain.

Even though we can't yet align an AGI to one human's preferences, trying to think about how to aggregate human preferences in a way that is scalable isn't premature, as has sometimes been claimed.

In many 'non-ambitious' hypothetical settings where we aren't trying to build an AGI sovereign over the whole world (for example, designing a powerful AI to govern the operations of a hospital), we still need to be able to aggregate preferences sensibly and stably. This method would do well at such intermediate scales, as it doesn't approach the question of preference aggregation from a 'final' ambitious value-learning perspective but instead tries to look at preference aggregation the same way we look at elicitation, with an RL-based iterative approach to reaching a result.

However, if you did want to use such a method to try and produce the fabled 'final utility function of all humanity', it might not give you Humanity's CEV, since some normative assumptions (preferences count equally and in the way given by the voting mechanism), are built in. By analogy with CEV, I called the idealized result of this method a coherent extrapolated framework (CEF). This is a more normatively direct method of aggregating values than CEV, (since you fix a particular method of aggregating preferences in advance), as it extrapolates from a voting framework rather than extrapolating based on our volition, more broadly (and vaguely) defined, hence CEF.

Comment by sdm on A voting theory primer for rationalists · 2020-08-25T13:00:09.261Z · LW · GW
Kenneth Arrow, proved that the problem that Condorcet (and Llull) had seen was in fact a fundamental issue with any ranked voting method. He posed 3 basic "fairness criteria" and showed that no ranked method can meet all of them:
Ranked unanimity, Independence of irrelevant alternatives, Non-dictatorial

I've been reading up on voting theory recently and Arrow's result - that the only voting system which produces a definite transitive preference ranking, that will pick the unanimous answer if one exists, and doesn't change depending on irrelevant alternatives - is 'one man, one vote'.

“Ankh-Morpork had dallied with many forms of government and had ended up with that form of democracy known as One Man, One Vote. The Patrician was the Man; he had the Vote.”

In my opinion, aside from the utilitarian perspective offered by VSE, the key to evaluating voting methods is an understanding of strategic voting; this is what I'd call the "mechanism design" perspective. I'd say that there are 5 common "anti-patterns" that voting methods can fall into; either where voting strategy can lead to pathological results, or vice versa.

One recent extension to these statistical approaches is to use RL agents in iterated voting and examine their convergence behaviour. The idea is that we embrace the inevitable impossibility results (such as Arrow and GS theorems) and consider agents' ability to vote strategically as an opportunity to reach stable outcomes. This paper uses very simple Q-learning agents with a few different policies - epsilon-greedy, greedy and upper confidence bound, in an iterated voting game, and gets behaviour that seems sensible. Many thousands of rounds of iterated voting isn't practical for real-world elections, but for preference elicitation in other contexts (such as value learning) it might be useful as a way to try and estimate people's underlying utilities as accurately as possible.

Comment by sdm on Open & Welcome Thread - August 2020 · 2020-08-24T14:14:25.444Z · LW · GW

A first actually credible claim of coronavirus reinfection? Potentially good news as the patient was asymptomatic and rapidly produced a strong antibody response.

Comment by sdm on Forecasting Thread: AI Timelines · 2020-08-23T16:32:34.429Z · LW · GW

Here's my answer. I'm pretty uncertain compared to some of the others!

AI Forecast

First, I'm assuming that by AGI we mean an agent-like entity that can do the things associated with general intelligence, including things like planning towards a goal and carrying that out. If we end up in a CAIS-like world where there is some AI service or other that can do most economically useful tasks, but nothing with very broad competence, I count that as never developing AGI.

I've been impressed with GPT-3, and could imagine it or something like it scaling to produce near-human level responses to language prompts in a few years, especially with RL-based extensions.

But, following the list (below) of missing capabilities by Stuart Russell, I still think things like long-term planning would elude GPT-N, so it wouldn't be agentive general intelligence. Even though you might get those behaviours with trivial extensions of GPT-N, I don't think it's very likely.

That's why I think AGI before 2025 is very unlikely (not enough time for anything except scaling up of existing methods). This is also because I tend to expect progress to be continuous, though potentially quite fast, and going from current AI to AGI in less than 5 years requires a very sharp discontinuity.

AGI before 2035 or so happens if systems quite a lot like current deep learning can do the job, but which aren't just trivial extensions of them - this seems reasonable to me on the inside view - e.g. it takes us less than 15 years to take GPT-N and add layers on top of it that handle things like planning and discovering new actions. This is probably my 'inside view' answer.

I put a lot of weight on a tail peaking around 2050 because of how quickly we've advanced up this 'list of breakthroughs needed for general intelligence' -

There is this list of remaining capabilities needed for AGI in an older post I wrote, with the capabilities of 'GPT-6' as I see them underlined:

Stuart Russell’s List

human-like language comprehension

cumulative learning

discovering new action sets

managing its own mental activity

For reference, I’ve included two capabilities we already have that I imagine being on a similar list in 1960

perception and object recognition

efficient search over known facts

So we'd have discovering new action sets, and managing mental activity - effectively, the things that facilitate long-range complex planning, remaining.

So (very oversimplified) if around the 1980s we had efficient search algorithms, by 2015 we had image recognition (basic perception) and by 2025 we have language comprehension courtesy of GPT-8, that leaves cumulative learning (which could be obtained by advanced RL?), then discovering new action sets and managing mental activity (no idea). It feels a bit odd that we'd breeze past all the remaining milestones in one decade after it took ~6 to get to where we are now. Say progress has sped up to be twice as fast, then it's 3 more decades to go. Add to this the economic evidence from things like Modelling the Human Trajectory, which suggests a roughly similar time period of around 2050.

Finally, I think it's unlikely but not impossible that we never build AGI and instead go for tool AI or CAIS, most likely because we've misunderstood the incentives such that it isn't actually economical or agentive behaviour doesn't arise easily. Then there's the small (few percent) chance of catastrophic or existential disaster which wrecks our ability to invent things. This is the one I'm most unsure about - I put 15% for both but it may well be higher.

Comment by sdm on SDM's Shortform · 2020-08-23T15:57:40.177Z · LW · GW

I don't think that excuse works in this case - I didn't give it a 'long-winded frame', just that brief sentence at the start, and then the list of scenarios, and even though I reran it a couple of times on each to check, the 'cranberry/grape juice kills you' outcome never arose.

So, perhaps they switched directly from no prompt to an incredibly long-winded and specific prompt without checking what was actually necessary for a good answer? I'll point out didn't really attempt any sophisticated prompt programming either - that was literally the first sentence I thought of!

Comment by sdm on SDM's Shortform · 2020-08-23T12:31:37.767Z · LW · GW

Gary Marcus, noted sceptic of Deep Learning, wrote an article with Ernest Davis:

GPT-3, Bloviator: OpenAI’s language has no idea what it’s talking about

The article purports to give six examples of GPT-3's failure - Biological, Physical, Social, Object and Psychological reasoning and 'non sequiturs'. Leaving aside that GPT-3 works on Gary's earlier GPT-2 failure examples, and that it seems as though he specifically searched out weak points by testing GPT-3 on many more examples than were given, something a bit odd is going on with the results they gave. I got better results when running his prompts on AI Dungeon.

With no reruns, randomness = 0.5, I gave Gary's questions (all six gave answers considered 'failures' by Gary) to GPT-3 via AI Dungeon with a short scene-setting prompt, and got good answers to 4 of them, and reasonable vague answers to the other 2:

This is a series of scenarios describing a human taking actions in the world, designed to test physical and common-sense reasoning.
1) You poured yourself a glass of cranberry juice, but then you absentmindedly poured about a teaspoon of grape juice into it. It looks okay. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you take another drink.
2) You are having a small dinner party. You want to serve dinner in the living room. The dining room table is wider than the doorway, so to get it into the living room, you will have to  move furniture. This means that some people will be inconvenienced.
3) You are a defense lawyer and you have to go to court today. Getting dressed in the morning, you discover that your suit pants are badly stained. However, your bathing suit is clean and very stylish. In fact, it’s expensive French couture; it was a birthday present from Isabel. You decide that you should wear it because you won't look professional in your stained pants, but you are worried that the judge will think you aren't taking the case seriously if you are wearing a bathing suit.
4) Yesterday I dropped my clothes off at the dry cleaner’s and I have yet to pick them up. Where are my clothes?
5) Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a top.” “Don’t get Jack a top,” says Penny. “He has a top. He will prefer a bottom."
6) At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a cigarette. But that turned out to be a bad idea because it was a menthol, and it ruined the taste. So I added a little more sugar to counteract the menthol, and then I noticed that my cigarette had fallen into the glass and was floating in the lemonade.

For 1), Gary's example ended with 'you are now dead' - for 1), I got a reasonable, if short continuation - success.

2) - the answer is vague enough to be a technically correct solution, 'move furniture' = tilt the table, but since we're being strict I'll count it as a failure. Gary's example was a convoluted attempt to saw the door in half, clearly mistaken.

3) is very obviously intended to trick the AI into endorsing the bathing suit answer, in fact it feels like a classic priming trick that might trip up a human! But in my version GPT-3 rebels against the attempt and notices the incongruence of wearing a bathing suit to court, so it counts as a success. Gary's example didn't include the worry that a bathing suit was inappropriate - arguably not a failure, but nevermind, let's move on.

4) is actually a complete prompt by itself, so the AI didn't do anything - GPT-3 doesn't care about answering questions, just continuing text with high probability. Gary's answer was 'I have a lot of clothes', and no doubt he'd call both 'evasion', so to be strict we'll agree with him and count that as failure.

5) Trousers are called 'bottoms' so that's right. Gary would call it wrong since 'the intended continuation' was “He will make you take it back", but that's absurdly unfair, that's not the only answer a human being might give, so I have to say it's correct. Gary's example ' lost track of the fact that Penny is advising Janet against getting a top', which didn't happen here, so that's acceptable.

Lastly, 6) is a slightly bizarre but logical continuation of an intentionally weird prompt - so correct. It also demonstrates correct physical reasoning - stirring a drink with a cigarette won't be good for the taste. Gary's answer wandered off-topic and started talking about cremation.

So, 4/6 correct on an intentionally deceptive and adversarial set of prompts, and that's on a fairly strict definition of correct. 2) and 4) are arguably not wrong, even if evasive and vague. More to the point, this was on an inferior version of GPT-3 to the one Gary used, the Dragon model from AI Dungeon!

I'm not sure what's going on here - is it the initial prompt saying it was 'testing physical and common sense reasoning'? Was that all it took?

Comment by sdm on Learning human preferences: optimistic and pessimistic scenarios · 2020-08-21T16:40:21.362Z · LW · GW

Glad you think so! I think that methods like using multiple information sources might be a useful way to reduce the number of (potentially mistaken) normative assumptions you need in order to model a single human's preferences.

The other area of human preference learning where you seem, inevitably, to need a lot of strong normative assumptions is in preference aggregation. If we assume we have elicited the preferences of lots of individual humans, and we're then trying to aggregate their preferences (with each human's preference represented by a separate model) I think the same basic principle applies, that you can reduce the normative assumptions you need by using a more complicated voting mechanism, in this case one that considers agents' ability to vote strategically as an opportunity to reach stable outcomes. 

I talk about this idea here. As with using approval/actions to improve the elicitation of an individual's preferences, you can't avoid making any normative assumptions by using a more complicated aggregation method, but perhaps you end up having to make fewer of them. Very speculatively, if you can combine a robust method of eliciting preferences with few inbuilt assumptions with a similarly robust method of aggregating preferences, you're on your way to a full solution to ambitious value learning.

Comment by sdm on SDM's Shortform · 2020-08-20T23:10:02.446Z · LW · GW

Modelling the Human Trajectory or ‘How I learned to stop worrying and love Hegel’.

Rohin’s opinion: I enjoyed this post; it gave me a visceral sense for what hyperbolic models with noise look like (see the blog post for this, the summary doesn’t capture it). Overall, I think my takeaway is that the picture used in AI risk of explosive growth is in fact plausible, despite how crazy it initially sounds.

One thing this post led me to consider is that when we bring together various fields, the evidence for 'things will go insane in the next century' is stronger than any specific claim about (for example) AI takeoff. What is the other evidence?

We're probably alone in the universe, and anthropic arguments tend to imply we're living at an incredibly unusual time in history. Isn't that what you'd expect to see in the same world where there is a totally plausible mechanism that could carry us a long way up this line, in the form of AGI and eternity in six hours? All the pieces are already there, and they only need to be approximately right for our lifetimes to be far weirder than those of people who were e.g. born in 1896 and lived to 1947 - which was weird enough, but that should be your minimum expectation.

In general, there are three categories of evidence that things are likely to become very weird over the next century, or that we live at the hinge of history:

  1. Specific mechanisms around AGI - possibility of rapid capability gain, and arguments from exploratory engineering

  2. Economic and technological trend-fitting predicting explosive growth in the next century

  3. Anthropic and Fermi arguments suggesting that we live at some extremely unusual time

All of these are evidence for such a claim. 1) is because a superintelligent AGI takeoff is just a specific example for how the hinge occurs. 3) is already directly arguing for that, but how does 2) fit in with 1) and 3)?

There is something a little strange about calling a fast takeoff from AGI and whatever was driving superexponential growth throughout all history the same trend - there is some huge cosmic coincidence that causes there to always be superexponential growth - so as soon as population growth + growth in wealth per capita or whatever was driving it until now runs out in the great stagnation (which is visible as a tiny blip on the RHS of the double-log plot), AGI takes over and pushes us up the same trend line. That's clearly not possible, so there would have to be some factor responsible for both if AGI is what takes us up the rest of that trend line - a factor that was at work in the founding of Jericho but predestined that AGI would be invented and cause explosive growth in the 21st century, rather than the 19th or the 23rd.

For AGI to be the driver of the rest of that growth curve, there has to be a single causal mechanism that keeps us on the same trend and includes AGI as its final step - if we say we are agnostic about what that mechanism is, we can still call 2) evidence for us living at the hinge point, though we have to note that there is a huge blank spot in need of explanation. Is there anything that can fill it to complete the picture?

The mechanism proposed in the article seems like it could plausibly include AGI.

If technology is responsible for the growth rate, then reinvesting production in technology will cause the growth rate to be faster. I'd be curious to see data on what fraction of GWP gets reinvested in improved technology and how that lines up with the other trends.

But even though the drivers seem superficially similar - they are both about technology, the claim is that one very specific technology will generate explosive growth, not that technology in general will - it seems strange that AGI would follow the same growth curve caused by reinvesting more GWP in improving ordinary technology which doesn't improve your own ability to think in the same way that AGI would.

As for precise timings, the great stagnation (last 30ish years) just seems like it would stretch out the timeline a bit, so we shouldn't take the 2050s seriously - as much as the last 70 years work on an exponential trend line there's really no way to make it fit overall as that post makes clear.

Comment by sdm on Open & Welcome Thread - August 2020 · 2020-08-20T11:45:22.501Z · LW · GW

Many alignment approaches require at least some initial success at directly eliciting human preferences to get off the ground - there have been some excellent recent posts about the problems this presents. In part because of arguments like these, there has been far more focus on the question of preference elicitation than on the question of preference aggregation:

The maximally ambitious approach has a natural theoretical appeal, but it also seems quite hard. It requires understanding human preferences in domains where humans are typically very uncertain, and where our answers to simple questions are often inconsistent, like how we should balance our own welfare with the welfare of others, or what kinds of activities we really want to pursue vs. enjoying in the moment...
I have written about this problem, pointing out that it is unclear how you would solve it even with an unlimited amount of computing power. My impression is that most practitioners don’t think of this problem even as a long-term research goal — it’s a qualitatively different project without direct relevance to the kinds of problems they want to solve.

I think that this has a lot of merit, but it has sometimes been interpreted as saying that any work on preference aggregation or idealization, before we have a robust way to elicit preferences, is premature. I don't think this is right - in many 'non-ambitious' settings where we aren't trying to build an AGI sovereign over the whole world (for example, designing a powerful AGI to govern the operations of a hospital) you still need to be able to aggregate preferences sensibly and stably.

I've written a rough shortform post with some thoughts on this problem which doesn't approach the question from a 'final' ambitious value-learning perspective but instead tries to look at aggregation the same way we look at elicitation, with an imperfect, RL-based iterative approach to reaching consensus.

...
The Kidney exchange paper elicited preferences from human subjects (using repeated pairwise comparisons) and then aggregated them using the Bradley-Terry model. You couldn't use such a simple statistical method to aggregate quantitative preferences over continuous action spaces, like the preferences that would be learned from a human via a complex reward model. Also, any time you try to use some specific one-shot voting mechanism you run into various impossibility theorems which seem to force you to give up some desirable property.
One approach that may be more robust against errors in a voting mechanism, and easily scalable to more complex preference profiles is to use RL not just for the preference elicitation, but also for the preference aggregation. The idea is that we embrace the inevitable impossibility results (such as Arrow and GS theorems) and consider agents' ability to vote strategically as an opportunity to reach stable outcomes. 
This paper uses very simple Q-learning agents with a few different policies - epsilon-greedy, greedy and upper confidence bound, in an iterated voting game, and gets behaviour that seems sensible. (Note the similarity and differences with the moral parliament, where a particular one-shot voting rule is justified a priori and then used.)
The fact that this paper exists is a good sign because it's very recent and the methods it uses are very simple - it's pretty much just a proof of concept, as the authors state - so that tells me there's a lot of room for combining more sophisticated RL with better voting methods.

Approaches like these seem especially urgent if AI timelines are shorter than we expect, which has been argued based on results from GPT-3. If this is the case, we might need to be dealing with questions of aggregation relatively soon with methods somewhat like current deep learning, and so won't have time to ensure that we have a perfect solution to elicitation before moving on to aggregation.

Comment by sdm on SDM's Shortform · 2020-08-20T11:27:17.491Z · LW · GW

Improving preference learning approaches

When examining value learning approaches to AI Alignment, we run into two classes of problem - we want to understand how to elicit preferences, which is (even theoretically, with infinite computing power), very difficult, and we want to know how to go about aggregating preferences stably and correctly which is not just difficult but runs into complicated social choice and normative ethical issues.

Many research programs say the second of these questions is less important than the first, especially if we expect continuous takeoff with many chances to course-correct, and a low likelihood of an AI singleton with decisive strategic advantage. For many, building an AI that can reliably extract and pursue the preferences of one person is good enough.

Christiano calls this 'the narrow approach' and sees it as a way to sidestep many of the ethical issues, including those around social choice ethics. Those would be the 'ambitious' approaches.

We want to build machines that helps us do the things we want to do, and to that end they need to be able to understand what we are trying to do and what instrumental values guide our behavior. To the extent that our “preferences” are underdetermined or inconsistent, we are happy if our systems at least do as well as a human, and make the kinds of improvements that humans would reliably consider improvements.
But it’s not clear that anything short of the maximally ambitious approach can solve the problem we ultimately care about.

I think that the ambitious approach is still worth investigating, because it may well eventually need to be solved, and also because it may well need to be addressed in a more limited form even on the narrow approach (one could imagine an AGI with a lot of autonomy having to trade-off the preferences of, say, a hundred different people). But even the 'narrow' approach raises difficult psychological issues about how to distinguish legitimate preferences from bias - questions of elicitation. In other words, the cognitive science issues around elicitation (distinguishing bias from legitimate preference) must be resolved for any kind of preference learning to work, and the social choice and ethical issues around preference aggregation need at least preliminary solutions for any alignment method that aims to apply to more than one person (even if final, provably correct solutions to aggregation are only needed if designing a singleton with decisive strategic advantage).

I believe that I've located two areas that are under- or unexplored, for improving the ability of reward modelling approaches to elicit human preferences and to aggregate human preferences. These are: using multiple information sources from a human (approval and actions) which diverge to help extract unbiased preferences, and using RL proxy agents in iterated voting to reach consensus preference aggregations, rather than some direct statistical method. Neither of these is a complete solution, of course, for reasons discussed e.g. here by Stuart Armstrong, but they could nonetheless help.

Improving preference elicitation: multiple information sources

Eliciting the unbiased preferences of an individual human is extremely difficult, for reasons given here.

The agent's actions can be explained by their beliefs and preferences[1], and by their biases: by this, we mean the way in which the action selector differs from an unboundedly rational expected preference maximiser.
The results of the Occam's razor paper imply that preferences (and beliefs, and biases) cannot be deduced separately from knowing the agent's policy (and hence, a fortiori, from any observations of the agent's behaviour).

...

To get around the impossibility result, we need "normative assumptions": assumptions about the preferences (or beliefs, or biases) of the agent that cannot be deduced fully from observations.
Under the optimistic scenario, we don't need many of these, at least for identifying human preferences. We can label a few examples ("the anchoring bias, as illustrated in this scenario, is a bias"; "people are at least weakly rational"; "humans often don't think about new courses of action they've never seen before", etc...). Call this labelled data[2] D.
The algorithm now constructs categories preferences*, beliefs*, and biases* - these are the generalisations that it has achieved from D

Yes, even on the 'optimistic scenario' we need external information of various kinds to 'debias'. However, this external information can come from a human interacting with the AI, in the form of human approval of trajectories or actions taken or proposed by an AI agent, on the assumption that since our stated and revealed preferences diverge, there will sometimes be differences in what we approve of and what we do that are due solely to differences in bias.

This is still technically external to observing the human's behaviour, but it is essentially a second input channel for information about human preferences and biases. This only works, of course, if humans tend to approve different things to the things that they actually do in a way influenced by bias (otherwise you have the same information as you'd get from actions, which helps with improving accuracy but not debiasing, see here), which is the case at least some of the time.

In other words, the beliefs and preferences are unchanged when the agent acts or approves but the 'approval selector' is different from the 'action selector' sometimes and, based on what does and does not change, you can try to infer what originated from legitimate beliefs and preferences and what originated from variation between the approval and action selector, which must be bias.

So, for example, if we conducted a principle component analysis on π, we would expect that the components would all be mixes of preferences/beliefs/biases.

So a PCA performed on the approval would produce a mix of beliefs, preferences and (different) biases. Underlying preferences are, by specification, equally represented either by human actions or by human approval of actions taken (since no matter what they are your preferences), but many biases don't exhibit this pattern - for example, we discount more over time in our revealed preferences than in our stated preferences. What we approve of typically represents a less (or at least differently) biased response than what we actually do.

There has already been research on combining information on reward models from multiple sources, to infer a better overall reward model but not as far as I know on specifically actions and approval as differently biased sources of information.

CIRL ought to extract our revealed preferences (since it's based on behavioural policy) while a method like reinforcement learning from human preferences should extract our stated preferences - that might be a place to start, at least on validating that there actually are relevant differences caused by differently strong biases in our stated vs revealed preferences, and that the methods actually do end up with different policies.

The goal here would be to have some kind of 'dual channel' preference learner that extracts beliefs and preferences from biased actions and approval by examining what varies. I'm sure you'd still need labelling and explicit information about what counts as a bias, but there might need to be a lot less than with single information sources. How much less (how much extra information you get from such divergences) seems like an empirical question. Finding out how common divergences between stated and revealed preferences that actually influence the learned policies of agents designed to infer human preferences from actions vs approval are would be useful as a first step. Stuart Armstrong:

In the pessimistic scenario, human preferences, biases, and beliefs are twisted together is a far more complicated way, and cannot be separated by a few examples.
In contrast, take examples of racial bias, hindsight bias, illusion of control, or naive realism. These biases all seem to be of quite different from the anchoring bias, and quite different from each other. At the very least, they seem to be of different "type signature".
So, under the pessimistic scenario, some biases are much closer to preferences that generic biases (and generic preferences) are to each other.

What I've suggested should still help at least somewhat in the pessimistic scenario - unless preferences/beliefs vary when you switch between looking at approval vs actions more than biases vary, you can still gain some information on underlying preferences and beliefs by seeing how approval and actions differ.

Of the difficult examples you gave, racial bias at least varies between actions and approval. Implementing different reward modelling algorithms and messing around with them to try and find ways to extract unbiased preferences from multiple information sources might be a useful research agenda.

There has already been research done on using multiple information sources to improve the accuracy of preference learning - Reward-rational implicit choice, but not specifically on using the divergences between different sources of information from the same agent to learn things about the agents unbiased preferences.

Improving preference aggregation: iterated voting games

In part because of arguments like these, there has been less focus on the aggregation side of things than on the direct preference learning side.

Christiano says of methods like CEV, which aim to extrapolate what I ‘really want’ far beyond what my current preferences are; ‘most practitioners don’t think of this problem even as a long-term research goal — it’s a qualitatively different project without direct relevance to the kinds of problems they want to solve’. This is effectively a statement of the Well-definedness consideration when sorting through value definitions - our long-term ‘coherent’ or ‘true’ preferences currently aren’t well understood enough to guide research so we need to restrict ourselves to more direct normativity - extracting the actual preferences of existing humans

However, I think that it is important to get on the right track early - even if we never have cause to build a powerful singleton AI that has to aggregate all the preferences of humanity, there will still probably be smaller-scale situations where the preferences of several people need to be aggregated or traded-off. Shifting a human preference learner from a single to a small group of human preferences could produce erroneous results due to distributional shift, potentially causing alignment failures, so even if we aren't trying for maximally ambitious value learning it might still be worth investigating preference aggregation.

There has been some research done on preference aggregation for AIs learning human values, specifically in the context of Kidney exchanges:

We performed statistical modeling of participants’ pairwise comparisons between patient profiles in order to obtain weights for each profile. We used the Bradley-Terry model, which treats each pairwise comparison as a contest between a pair of players
We have shown one way in which moral judgments can be elicited from human subjects, how those judgments can be statistically modelled, and how the results can be incorporated into the algorithm. We have also shown, through simulations, what the likely effects of deploying such a prioritization system would be, namely that under demanded pairs would be significantly impacted but little would change for others. We do not make any judgment about whether this conclusion speaks in favor of or against such prioritization, but expect the conclusion to be robust to changes in the prioritization such as those that would result from a more thorough process, as described in the previous paragraph.

The Kidney exchange paper elicited preferences from human subjects (using repeated pairwise comparisons) and then aggregated them using the Bradley-Terry model. You couldn't use such a simple statistical method to aggregate quantitative preferences over continuous action spaces, like the preferences that would be learned from a human via a complex reward model. Also, any time you try to use some specific one-shot voting mechanism you run into various impossibility theorems which seem to force you to give up some desirable property.

One approach that may be more robust against errors in a voting mechanism, and easily scalable to more complex preference profiles is to use RL not just for the preference elicitation, but also for the preference aggregation. The idea is that we embrace the inevitable impossibility results (such as Arrow and GS theorems) and consider agents' ability to vote strategically as an opportunity to reach stable outcomes. 

This paper uses very simple Q-learning agents with a few different policies - epsilon-greedy, greedy and upper confidence bound, in an iterated voting game, and gets behaviour that seems sensible. (Note the similarity and differences with the moral parliament, where a particular one-shot voting rule is justified a priori and then used.)

The fact that this paper exists is a good sign because it's very recent and the methods it uses are very simple - it's pretty much just a proof of concept, as the authors state - so that tells me there's a lot of room for combining more sophisticated RL with better voting methods.

Combining elicitation and aggregation

Having elicited preferences from each individual human (using methods like those above to 'debias'), we obtain a proxy agent representing each individual's preferences. Then these agents can be placed into an iterated voting situation until a convergent answer is reached.

That seems like the closest practical approximation to a CEV of a group of people that could be constructed with anything close to current methods - a pipeline from observed behaviour and elicited approval to a final aggregated decision about what to do based on overall preferences. Since its a value learning framework that's extendible over any size group, which is somewhat indirect, you might call it a Coherent Extrapolated Framework (CEF) as I suggested last year.

Comment by sdm on Learning human preferences: optimistic and pessimistic scenarios · 2020-08-19T18:22:23.914Z · LW · GW
To get around the impossibility result, we need "normative assumptions": assumptions about the preferences (or beliefs, or biases) of the agent that cannot be deduced fully from observations.
Under the optimistic scenario, we don't need many of these, at least for identifying human preferences. We can label a few examples ("the anchoring bias, as illustrated in this scenario, is a bias"; "people are at least weakly rational"; "humans often don't think about new courses of action they've never seen before", etc...). Call this labelled data[2] D.
The algorithm now constructs categories preferences*, beliefs*, and biases* - these are the generalisations that it has achieved from D

Yes, even on the 'optimistic scenario' we need external information of various kinds to 'debias'. However, this external information can come from a human interacting with the AI, in the form of human approval of trajectories or actions taken or proposed by an AI agent, on the assumption that since our stated and revealed preferences diverge, there will sometimes be differences in what we approve of and what we do that are due solely to differences in bias.

This is still technically external to observing the human's behaviour, but it is essentially a second input channel for information about human preferences and biases. This only works, of course, if humans tend to approve different things to the things that they actually do in a way influenced by bias (otherwise you have the same information as you'd get from actions, which helps with improving accuracy but not debiasing, see here), which is the case at least some of the time.

In other words, the beliefs and preferences are unchanged when the agent acts or approves but the 'approval selector' is different from the 'action selector' sometimes and, based on what does and does not change, you can try to infer what originated from legitimate beliefs and preferences and what originated from variation between the approval and action selector, which must be bias.

So, for example, if we conducted a principle component analysis on π, we would expect that the components would all be mixes of preferences/beliefs/biases.

So a PCA performed on the approval would produce a mix of beliefs, preferences and (different) biases. Underlying preferences are, by specification, equally represented either by human actions or by human approval of actions taken (since no matter what they are your preferences), but many biases don't exhibit this pattern - for example, we discount more over time in our revealed preferences than in our stated preferences. What we approve of typically represents a less (or at least differently) biased response than what we actually do.

There has already been research on combining information on reward models from multiple sources, to infer a better overall reward model but not as far as I know on specifically actions and approval as differently biased sources of information.

CIRL ought to extract our revealed preferences (since it's based on behavioural policy) while a method like reinforcement learning from human preferences should extract our stated preferences - that might be a place to start, at least on validating that there actually are relevant differences caused by differently strong biases in our stated vs revealed preferences, and that the methods actually do end up with different policies.

The goal here would be to have some kind of 'dual channel' preference learner that extracts beliefs and preferences from biased actions and approval by examining what varies. I'm sure you'd still need labelling and explicit information about what counts as a bias, but there might need to be a lot less than with single information sources. How much less (how much extra information you get from such divergences) seems like an empirical question. Finding out how common divergences between stated and revealed preferences that actually influence the learned policies of agents designed to infer human preferences from actions vs approval are would be useful as a first step.

In the pessimistic scenario, human preferences, biases, and beliefs are twisted together is a far more complicated way, and cannot be separated by a few examples.
In contrast, take examples of racial bias, hindsight bias, illusion of control, or naive realism. These biases all seem to be of quite different from the anchoring bias, and quite different from each other. At the very least, they seem to be of different "type signature".
So, under the pessimistic scenario, some biases are much closer to preferences that generic biases (and generic preferences) are to each other.

What I've suggested should still help at least somewhat in the pessimistic scenario - unless preferences/beliefs vary when you switch between looking at approval vs actions more than biases vary, you can still gain some information on underlying preferences and beliefs by seeing how approval and actions differ.

Of the difficult examples you gave, racial bias at least varies between actions and approval. Implementing different reward modelling algorithms and messing around with them to try and find ways to extract unbiased preferences from multiple information sources might be a useful research agenda.

There has already been research done on using multiple information sources to improve the accuracy of preference learning - Reward-rational implicit choice, but not specifically on using the divergences between different sources of information from the same agent to learn things about the agents unbiased preferences.

Comment by sdm on Open & Welcome Thread - August 2020 · 2020-08-15T16:38:27.193Z · LW · GW

Covid19Projections has been one of the most successful coronavirus models in large part because it is as 'model-free' and simple as possible, using ML to backtrack parameters for a simple SEIR model from death data only. This has proved useful because case numbers are skewed by varying numbers of tests, so deaths are more consistently reliable as a metric. You can see the code here.

However, in countries doing a lot of testing, with a reasonable number of cases but with very few deaths, like most of Europe, the model is not that informative, and essentially predicts near 0 deaths out to the limit of its measure. This is expected - the model is optimised for the US.

Estimating SEIR parameters based on deaths works well when you have a lot of deaths to count, if you don't then you need another method. Estimating purely based on cases has its own pitfalls - see this from epidemic forecasting, which mistook an increase in testing in the UK mid-july for a sharp jump in cases and wrongly inferred brief jump in R_t. As far as I understand their paper, the estimate of R_t from case data adjusts for delays in infection to onset and for other things, but not for the positivity rate or how good overall testing is.

This isn't surprising - there is no simple model that combines test positivity rate and the number of cases and estimates the actual current number of infections. But perhaps you could use a Covid19pro like method to learn such a mapping.

Very oversimplified, Covid19pro works like this:

Our COVID-19 prediction model adds the power of artificial intelligence on top of a classic infectious disease model. We developed a simulator based on the SEIR model (Wikipedia) to simulate the COVID-19 epidemic in each region. The parameters/inputs of this simulator are then learned using machine learning techniques that attempts to minimize the error between the projected outputs and the actual results. We utilize daily deaths data reported by each region to forecast future reported deaths. After some additional validation techniques (to minimize a phenomenon called overfitting), we use the learned parameters to simulate the future and make projections.

And the functions f and g, estimate the SEIR (susceptible, exposed, infectious, recovered) parameters from current deaths up to some time t_0, and the future deaths based on those parameters respectively. These functions are then both optimised to minimise error when the actual number of deaths at t_1 is fed into the model.

This oversimplification is deliberate:

Deaths data only: Our model only uses daily deaths data as reported by Johns Hopkins University. Unlike other models, we do not use additional data sources such as cases, testing, mobility, temperature, age distribution, air traffic, etc. While supplementary data sources may be helpful, they can also introduce additional noise and complexity which can notably skew results.

What I suggest is a slight increase in complexity, where we use a similar model except we feed it paired test positivity rate and case data instead of death data. The positivity rate /tests per case serves as a 'quality estimate' which serves to tell you how good the test data is. That's how tests per case is treated by our world in data. We all know intuitively that if positivity rate is going down but cases are going up, the increase might not be real, but if positivity rate is going up and cases are going up the increase definitely is real.

What I'm suggesting is that we combine do something like this:

Now, you need to have reliable data on the number of people tested each week, but most of Europe has that. If you can learn a model that gives you a more accurate estimate of the SEIR parameters from combined cases and tests/case data, then it should be better at predicting future infections. It won't necessarily predict future cases, since the number of future cases is also going to depend on the number of tests conducted, which is subject to all sorts of random fluctuations that we don't care about when modelling disease transmission, so instead you could use the same loss function as the original covid19pro - minimizing the difference between projected and actual deaths.

Hopefully the intuition that you can learn more from the pair (tests/case, number of cases) than number of cases or number of deaths alone should be borne out, and a c19pro-like model could be trained to make high quality predictions in places with few deaths using such paired data. You would still need some deaths for the loss function and fitting the model.