Posts

SDM's Shortform 2020-07-23T14:53:52.568Z
Modelling Continuous Progress 2020-06-23T18:06:47.474Z
Coronavirus as a test-run for X-risks 2020-06-13T21:00:13.859Z
Will AI undergo discontinuous progress? 2020-02-21T22:16:59.424Z
The Value Definition Problem 2019-11-18T19:56:43.271Z

Comments

Comment by sdm on Commentary on AGI Safety from First Principles · 2020-11-25T16:28:27.402Z · LW · GW

Yeah - this is a case where how exactly the transition goes seems to make a very big difference. If it's a fast transition to a singleton, altering the goals of the initial AI is going to be super influential. But if it's that there are many generations of AIs that over time become the larger majority of the economy, then just control everything - predictably altering how that goes seems a lot harder at least.

Comparing the entirety of the Bostrom/Yudkowsky singleton intelligence explosion scenario to the slower more spread out scenario, it's not clear that it's easier to predictably alter the course of the future in the first compared to the second.

In the first, assuming you successfully set the goals of the singleton, the hard part is over and the future can be steered easily because there are, by definition, no more coordination problems to deal with. But in the first, a superintelligent AGI could explode on us out of nowhere with little warning and a 'randomly rolled utility function', so the amount of coordination we'd need pre-intelligence explosion might be very large.

In the second slower scenario, there are still ways to influence the development of AI - aside from massive global coordination and legislation, there may well be decision points where two developmental paths are comparable in terms of short-term usefulness but one is much better than the other in terms of alignment or the value of the long-term future. 

Stuart Russell's claim that we need to replace 'the standard model' of AI development is one such example - if he's right, a concerted push now by a few researchers could alter how nearly all future AI systems are developed for the better. So different conditions have to be met for it to be possible to predictably alter the future long in advance on the slow transition model (multiple plausible AI development paths that could be universally adopted and have ethically different outcomes) compared to the fast transition model (the ability to anticipate when and where the intelligence explosion will arrive and do all the necessary alignment work in time), but its not obvious to me one is easier to meet than the other.

 

For this reason, I think it's unlikely there will be a very clearly distinct "takeoff period" that warrants special attention compared to surrounding periods.

I think the period AI systems can, at least in aggregate, finally do all the stuff that people can do might be relatively distinct and critical -- but, if progress in different cognitive domains is sufficiently lumpy, this point could be reached well after the point where we intuitively regard lots of AI systems as on the whole "superintelligent."

This might be another case (like 'the AIs utility function') where we should just retire the term as meaningless, but I think that 'takeoff' isn't always a strictly defined interval, especially if we're towards the medium-slow end. The start of the takeoff has a precise meaning only if you believe that RSI is an all-or-nothing property. In this graph from a post of mine, the light blue curve has an obvious start to the takeoff where the gradient discontinuously changes, but what about the yellow line? There clearly is a takeoff in that progress becomes very rapid, but there's no obvious start point, but there is still a period very different from our current period that is reached in a relatively short space of time - so not 'very clearly distinct' but still 'warrants special attention'.

 

At this point I think it's easier to just discard the terminology altogether. For some agents, it's reasonable to describe them as having goals. For others, it isn't. Some of those goals are dangerous. Some aren't. 

Daniel Dennett's Intentional stance is either a good analogy for the problem of "can't define what has a utility function" or just a rewording of the same issue. Dennett's original formulation doesn't discuss different types of AI systems or utility functions, ranging in 'explicit goal directedness' all the way from expected-minmax game players to deep RL to purely random agents, but instead discusses physical systems ranging from thermostats up to humans. Either way, if you agree with Dennett's formulation of the intentional stance I think you'd also agree that it doesn't make much sense to speak of 'the utility function as necessarily well-defined.

Comment by sdm on Covid 11/19: Don’t Do Stupid Things · 2020-11-20T18:42:48.555Z · LW · GW

Much of Europe went into strict lockdown. I was and am still skeptical that they were right to keep schools open, but it was a real attempt that clearly was capable of working, and it seems to be working.

The new American restrictions are not a real attempt, and have no chance of working.

The way I understand it is that 'being effective' is making an efficient choice taking into account asymmetric risk and the value of information, and the long-run trade-offs. This involves things like harsh early lockdowns, throwing endless money at contact tracing, and strict enforcement of isolation. Think Taiwan, South Korea.

Then 'trying' is adopting policies that have a reasonable good chance of working, but not having a plan if they don't work, not erring on the side of caution of taking into account asymmetric risk when you adopt the policies, and not responding to new evidence quickly. The schools thing is a perfect example - closing has costs (makes the lockdown less effective and therefore longer), and it wasn't overwhelmingly clear that schools had to close to turn R under 1, so that was good enough. Partially funding tracing efforts, waiting until there's visibly no other choice and then calling a strict lockdown - that's 'trying'. Think the UK and France.

And then you have 'trying to try', which you explain in detail.

Dolly Parton helped fund the Moderna vaccine. Neat. No idea why anyone needed to do that, but still. Neat.

It's reassuring to know that if the administrative state and the pharmaceutical industry fails, we have Dolly Parton.

Comment by sdm on Some AI research areas and their relevance to existential safety · 2020-11-20T18:22:22.622Z · LW · GW

That said, I remain interested in more clarity on what you see as the biggest risks with these multi/multi approaches that could be addressed with technical research.

A (though not necessarily the most important) reason to think technical research into computational social choice might be useful is that examining specifically the behaviour of RL agents from a computational social choice perspective might alert us to ways in which coordination with future TAI might be similar or different to the existing coordination problems we face.

(i) make direct improvements in the relevant institutions, in a way that anticipates the changes brought about by AI but will most likely not look like AI research, 

It seems premature to say, in advance of actually seeing what such research uncovers, whether the relevant mechanisms and governance improvements are exactly the same as the improvements we need for good governance generally, or different. Suppose examining the behaviour of current RL agents in social dilemmas leads to a general result which in turn leads us to conclude there's a disproportionate chance TAI in the future will coordinate in some damaging way that we can resolve with a particular new regulation. It's always possible to say, solving the single/single alignment problem will prevent anything like that from happening in the first place, but why put all your hopes on plan A, when plan B is relatively neglected?

Comment by sdm on Some AI research areas and their relevance to existential safety · 2020-11-20T18:10:34.056Z · LW · GW

Thanks for this long and very detailed post!

The MARL projects with the greatest potential to help are probably those that find ways to achieve cooperation between decentrally trained agents in a competitive task environment, because of its potential to minimize destructive conflicts between fleets of AI systems that cause collateral damage to humanity.  That said, even this area of research risks making it easier for fleets of machines to cooperate and/or collude at the exclusion of humans, increasing the risk of humans becoming gradually disenfranchised and perhaps replaced entirely by machines that are better and faster at cooperation than humans.

In ARCHES, you mention that just examining the multiagent behaviour of RL systems (or other systems that work as toy/small-scale examples of what future transformative AI might look like) might enable us to get ahead of potential multiagent risks, or at least try to predict how transformative AI might behave in multiagent settings. The way you describe it in ARCHES, the research would be purely exploratory,

One approach to this research area is to continually ex-amine social dilemmas through the lens of whatever is the leading AI devel-opment paradigm in a given year or decade, and attempt to classify interest-ing behaviors as they emerge. This approach might be viewed as analogousto developing “transparency for multi-agent systems”: first develop inter-esting multi-agent systems, and then try to understand them. 

But what you're suggesting in this post, 'those that find ways to achieve cooperation between decentrally trained agents in a competitive task environment', sounds like combining computational social choice research with multiagent RL -  examining the behaviour of RL agents in social dilemmas and trying to design mechanisms that work to produce the kind of behaviour we want. To do that, you'd need insights from social choice theory. There is some existing research on this, but it's sparse and very exploratory.

My current research is attempting to build on the second of these.

As far as I can tell, that's more or less it in terms of examining RL agents in social dilemmas, so there may well be a lot of low-hanging fruit and interesting discoveries to be made. If the research is specifically about finding ways of achieving cooperation in multiagent systems by choosing the correct (e.g. voting) mechanism, is that not also computational social choice research, and therefore of higher priority by your metric?

In short, computational social choice research will be necessary to legitimize and fulfill governance demands for technology companies (automated and human-run companies alike) to ensure AI technologies are beneficial to and controllable by human society.  

...

CSC neglect:

As mentioned above, I think CSC is still far from ready to fulfill governance demands at the ever-increasing speed and scale that will be needed to ensure existential safety in the wake of “the alignment revolution”. 

Comment by sdm on The 300-year journey to the covid vaccine · 2020-11-10T13:03:45.396Z · LW · GW

The remedies for all our diseases will be discovered long after we are dead; and the world will be made a fit place to live in, after the death of most of those by whose exertions it will have been made so. It is to be hoped that those who live in those days will look back with sympathy to their known and unknown benefactors.

— John Stuart Mill, diary entry for 15 April 1854

Comment by sdm on Covid 11/5: Don’t Mention the War · 2020-11-06T22:37:19.523Z · LW · GW

Very glad you took on board my objections re Fauci and signalling explanations, SL 2 vs SL 3. I don't disagree with your analysis of 'herd immunity' and agree that 'slowing the grind up to herd as much as possible to maximise the amount of vaccination that can occur' - what was recommended by some of the sources in my last post - is a viable strategy. But what that means in practice is still trying almost everything and not anything like the 'focused protection' or 'let it rip' strategy.

I can't speak for the US modellers but the best modellers I follow (Neil Ferguson and Adam Kurchaski in the UK, mobile.twitter.com/AdamJKucharski and mobile.twitter.com/neil_ferguson) are extremely aware of the Rt dispersion issue that lowers the HIT, and their models put it at the top end of your range (50%) and say that it's temporary and shifts as contact patterns change. They also estimate the IFR as more like 0.6% (though it was near 1% in the UK in the first wave. I think that's the current consensus but the numbers you give aren't out of the question.

Comment by sdm on AGI safety from first principles: Goals and Agency · 2020-11-02T18:00:56.649Z · LW · GW

Furthermore, we should take seriously the possibility that superintelligent AGIs might be even less focused than humans are on achieving large-scale goals. We can imagine them possessing final goals which don’t incentivise the pursuit of power, such as deontological goals, or small-scale goals. 

...

My underlying argument is that agency is not just an emergent property of highly intelligent systems, but rather a set of capabilities which need to be developed during training, and which won’t arise without selection for it

Was this line of argument inspired by Ben Garfinkel's objection to the 'classic' formulation of instrumental convergence/orthogonality - that these are 'measure based' arguments that just identify that a majority of possible agents with some agentive properties and large-scale goals will optimize in malign ways, rather than establishing that we're actually likely to build such agents?

It seems like you're identifying the same additional step that Ben identified, and that I argued could be satisfied - that we need a plausible reason why we would build an agentive AI with large-scale goals.

And the same applies for 'instrumental convergence' - the observation that most possible goals, especially simple goals, imply a tendency to produce extreme outcomes when ruthlessly maximised:

  • A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  

We could see this as marking out a potential danger - a large number of possible mind-designs produce very bad outcomes if implemented. The fact that such designs exist 'weakly suggest' (Ben's words) that AGI poses an existential risk since we might build them. If we add in other premises that imply we are likely to (accidentally or deliberately) build such systems, the argument becomes stronger. But usually the classic arguments simply note instrumental convergence and assume we're 'shooting into the dark' in the space of all possible minds, because they take the abstract statement about possible minds to be speaking directly about the physical world. There are specific reasons to think this might occur (e.g. mesa-optimisation, sufficiently fast progress preventing us from course-correcting if there is even a small initial divergence) but those are the reasons that combine with instrumental convergence to produce a concrete risk, and have to be argued for separately.

Comment by sdm on SDM's Shortform · 2020-10-30T17:04:06.134Z · LW · GW

I think that the notion of Simulacra Levels is both useful and important, especially when we incorporate Harry Frankfurt's idea of Bullshit

Harry Frankfurt's On Bullshit seems relevant here. I think its worth trying to incorporate Frankfurt's definition as well, as it is quite widely known, see e.g. this video - If you were to do so, I think you would say that on Frankfurt's definition, Level 1 tells the truth, Level 2 lies, Level 3 bullshits about physical facts but will lie or tell the truth about things in the social realm (e.g. others motives, your own affiliation), and Level 4 always bullshits.

How do we distinguish lying from bullshit? I worry that there is a tendency to adopt self-justifying signalling explanations, where an internally complicated signalling explanation that's hard to distinguish from a simpler 'lying' explanation, gets accepted, not because it's a better explanation overall but just because it has a ready answer to any objections. If 'Social cognition has been the main focus of Rationality' is true, then we need to be careful to avoid overusing such explanations. Stefan Schubert explains how this can end up happening:

...

It seems to me that it’s pretty common that signalling explanations are unsatisfactory. They’re often logically complex, and it’s tricky to identify exactly what evidence is needed to demonstrate them.

And yet even unsatisfactory signalling explanations are often popular, especially with a certain crowd. It feels like you’re removing the scales from our eyes; like you’re letting us see our true selves, warts and all. And I worry that this feels a bit too good to some: that they forget about checking the details of how the signalling explanations are supposed to work. Thus they devise just-so stories, or fall for them.

This sort of signalling paradigm also has an in-built self-defence, in that critics are suspected of hypocrisy or naïveté. They lack the intellectual honesty that you need to see the world for what it really is, the thinking goes

Comment by sdm on "Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} · 2020-10-30T14:28:12.882Z · LW · GW

It may well be a crux - an efficient 'tree search' or a similar goal-directed wrapper around a GPT-based system, that can play a role in real-world open-ended planning (presumably planning for an agent to be effecting outcomes in the real world via its text generation), would have to cover continuous action spaces and possible states containing unknown and shifting sets of possible actions (unlike the discrete and small, relative to the real universe, action space of Go which is perfect for a tree search), running (or approximating running) millions of primitive steps (individual text generations and exchanges) into the future (for long-term planning towards e.g. a multi-decade goal like humans are capable of).

That sounds like a problem that's at least as hard as a language-model 'success probability predictor' GPT-N (probably with reward-modelling help, so it can optimize for a specific goal with its text generation). Though such a system would still be highly transformative, if it was human-level at prediction.

To clarify, this is Transformative not 'Radically Transformative' - transformative like Nuclear Power/Weapons, not like a new Industrial Revolution or an intelligence explosion.

I would expect tree search powered by GPT-6 to be probably pretty agentic.

I could imagine (if you found a domain with a fairly constrained set of actions and states, but involved text prediction somehow) that you could get agentic behaviour out of a tree search like the ones we currently have + GPT-N + an RL wrapper around the GPT-N. That might well be quite transformative - could imagine it being very good for persuasion, for example.

Comment by sdm on Covid Covid Covid Covid Covid 10/29: All We Ever Talk About · 2020-10-30T14:00:08.215Z · LW · GW

In the UK these numbers and their rates relative to the first wave are plastered over the front-page of BBC news.

Comment by sdm on Open & Welcome Thread – October 2020 · 2020-10-30T13:45:47.316Z · LW · GW

I don't know Wei Dai's specific reasons for having such a high level of concern, but I suspect that they are similar to the arguments given by the historian Niall Ferguson in this debate with Yascha Mounk on how dangerous 'cancel culture' is. Ferguson likes to try and forecast social and cultural trends years in advance and thinks that he sees a cultural-revolution like trend growing unchecked.

Ferguson doesn't give an upper bound on how bad he thinks things could get, but he thinks 'worse than McCarthyism' is reasonable to expect over the next few years, because he thinks that 'cancel culture' has more broad cultural support and might also gain hard power in institutions.

Now - I am more willing to credit such worries than I was a year ago, but there's a vast gulf between a trend being concerning and expecting another Cultural Revolution. It feels too much like a direct linear extrapolation fallacy - 'things have become worse over the last year, imagine if that keeps on happening for the next six years!' I wasn't expecting a lot of what happened over the last eight months in the US on the 'cancel culture' side, but I think that a huge amount of this is due to a temporary, Trump- and Covid- and Recession-related heating up of the political discourse, not a durable shift in soft power or people's opinions. I think the opinion polls back this up. If I'm right that this will all cool down, we'll know in another year or so.

I also think that Yascha's arguments in that debate about the need for hard institutional power that's relatively unchecked, to get a Cultural-Revolution like outcome, are really worth considering. I don't see any realistic path to that level of hard, governmental power at enough levels being held by any group in the US.

Comment by sdm on "Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} · 2020-10-29T20:44:59.458Z · LW · GW

I think that it could plausibly be quite transformative in a TAI sense and occur over the next ten years, so perhaps we don't have all that much of a disagreement on that point. I also think (just because we don't have an especially clear idea of  how modular intelligence is) that it could be quite uniform and a text predictor could surprise us with humanlike planning.

Maybe the text predictor by itself wouldn't be an agent, but the text predictor could be re-trained as an agent fairly easily, or combined into a larger system that uses tree search or something and thus is an agent. 

This maybe reflects a difference in intuition about how difficult agentive behaviour is to reach rather than language understanding. I would expect a simple tree search algorithm powered by GPT-6 to be... a model with humanlike language comprehension and incredibly dumb agentive behaviour, and that it wouldn't be able to leverage the 'intelligence' of the language model in any significant way, because I see that as a seperate problem requiring seperate, difficult work. But I could be wrong.

I think there is a potential bias in that human-like language understanding and agentive behaviour have always gone together in human beings - we have no idea what a human-level language model that wasn't human-level intelligent would be like. Since we can't imagine it, we tend to default to imagining a human-in-a-box. I'm trying to correct for this bias by imagining that it might be quite different.

Comment by sdm on Covid Covid Covid Covid Covid 10/29: All We Ever Talk About · 2020-10-29T20:18:28.926Z · LW · GW

If you are keeping schools open in light of the graphs above, and think you are not giving up, I don’t even know how to respond.

I think the French lockdown probably won't work without school closures, and this probably will be noticed soon when the data comes through establishing that it doesn't work, and I think that it's extremely dumb to not close schools given that the risk for closing vs not closing at this point is extremely asymmetric, but this isn't 'giving up' knowingly (and I infer that you're suggesting Macron may be trying to show that he is trying while actually giving up) - this is simply Macron and his cabinet not intuitively understanding asymmetric risk and not realizing that it's much better to do far more than what was sufficient, compared to doing something that just stands an okay chance of being sufficient to suppress, in order to avoid costs later.

I think that there is a current tendency - and I see it in some of your statements about the beliefs of the 'doom patrol' - to use signalling explanations almost everywhere, and sometimes that shades into accepting a lower burden of proof, even if the explanation doesn't quite fit. For example, the European experience over the summer is mostly a story of a hideous but predictable failure to understand the asymmetric risk and costs of opening up / investing more vs less in tracing, testing and enforcement.

Signalling plays a role in explaining this irrationality, certainly, but as I explained in last week's comment wedging everything into a box of 'signalling explanations' doesn't always work. Maybe it makes more sense in the US, where the coronavirus response has been much more politicised. Stefan Schubert has a great blog post on this tendency:

It seems to me that it’s pretty common that signalling explanations are unsatisfactory. They’re often logically complex, and it’s tricky to identify exactly what evidence is needed to demonstrate them.

And yet even unsatisfactory signalling explanations are often popular, especially with a certain crowd. It feels like you’re removing the scales from our eyes; like you’re letting us see our true selves, warts and all. And I worry that this feels a bit too good to some: that they forget about checking the details of how the signalling explanations are supposed to work. Thus they devise just-so stories, or fall for them.

This sort of signalling paradigm also has an in-built self-defence, in that critics are suspected of hypocrisy or naïveté. They lack the intellectual honesty that you need to see the world for what it really is, the thinking goes

I think that a few of your explanations fall into this category.

They’re pushing the line that even after both of you have an effective vaccine you still need to socially distance.

Isn't this... true? Given that an effective vaccine will take time to distribute (best guess 25 million doses by early next spring), that there will be a long period where we're approaching herd immunity and the risk is steadily decreasing as more people become immune, Fauci is probably worried about people risk compensating during this interval, so he's trying to emphasise that a vaccine won't be perfectly protective and might take a while, maybe exaggerating both claims, while not outright lying. I agree that this type of thinking can shade into doom-mongering and sometimes outright lying about how long vaccines might take but this seems like solidly consequentialist lying to promote social distancing (SL 2), not bullshitting (SL 3). Maybe they've gotten the behavioural response wrong, and it's much better go be truthful, clear and give people reasonable hope (I think it is), but that's a difference in strategy, not pure SL3 bullshit. Why are you so confident that it's the latter?

I don’t think this is something being said in order to influence behavior, or even to influence beliefs. That is not the mindset we are dealing with at this point. It’s not about truth. It’s not about consequentialism. We have left not only simulacra level 1 but also simulacra level 2 fully behind. It’s about systems that instinctively and continuously pull in the direction of more fear, more doom, more warnings, because that is what is rewarded and high status and respectable and serious and so on, whereas giving people hope of any kind is the opposite. That’s all this is.

That's a bold claim to make about someone with a history like Fauci's, and since 'the priority with first vaccinations is to prevent symptoms and preventing infection is a bonus' is actually true, if misleading, I don't think it's warranted.

This just sounds exactly like generic public health messaging aimed at getting people to wear masks now by making them not focus on the prospect of a vaccine. Plus it might even be important to know, especially when you consider that vaccination will happen slowly and Fauci doesn't want people to risk compensate after some people around them have been vaccinated but they haven't been. I don't think Fauci is thinking beyond saying whatever he needs to say to drive up mask compliance right now, which is SL 2. Your explanation that Dr Fauci has lost track of whether or not vaccines actually prevent infection might be true - but it strikes me as weird and confusing, something you'd expect of a more visibly disordered person, and the kind of thing you'd need more evidence of than what he said in that little clip. I think those explanations absolutely have their place, especially for explaining some horrible public health messaging by some politicians and public-facing experts and most of the media, but I think this particular example is overuse of signalling explanations in the way argued for in the article I linked above. At the very least I think the SL2 consequentalist lying explanation is simpler and has a plausible story behind it, so I don't know why you'd go for the less clear SL3 explanation with apparent certainty.

Essentially, Europe chose to declare victory and leave home without eradication, and the problem returned, first slowly, now all at once, as it was bound to do without precautions.

We did take plenty of precautions, they were just wholly inadequate relative to the potential damage of a second wave. A lot of this was not understanding the asymmetric risk. Most of Europe had precautions that might work and testing and tracing systems that were catching some of the infected and various shifting rules about social distancing and it was at least unclear if they would be sufficient. I can't speak about other countries, but people in the UK were intellectually extremely nervous about the reopening and most people consistently polled saying it was too soon to reopen. For a while it worked - including in July when there was a brief increase in the UK that was reversed successfully. The number of people I see around me wearing masks has been increasing steadily ever since the start of the pandemic. So it was easy and convenient to say, 'it's a risk worth taking, it's worked out so far' at least for a while - even though any sane calculation of the risks should have said we ought to have invested vastly more than we did in testing, tracing, enforcement, supported isolation etc. even if things looked like they were under control.

Not that giving up is obviously the wrong thing to do! But that does not seem to be Macron’s plan.

...

We are going to lock you down if you misbehave, so if you misbehave all you’re doing is locking yourself down. She’s right, of course, that things will keep getting worse until we change the trajectory and make them start getting better, but no the interventions to regain control are exactly the same either way. You either get R below 1, or you don’t. Except that the more it got out of control first, the more voluntary adjustments you’ll see, and the more people will be immune, so the more out of control it gets the easier it is to control later. ...

And also the longer you wait, the longer you have to spend with stricter measures.

The measures don't need to be stricter unless you can't tolerate as long with high infection rates, in which case you need infection rates to go down much faster. I don't know if makes me and Tyler Cowen and most epidemiologists part of the 'doom patrol' if we say that you'll need a longer interval of either voluntary behaviour change to avoid infection or a longer lockdown the more you wait.

(Note that I'm not denying that there are such doomers. Some of the things you mention, like people explicitly denying coronavirus treatment has made the disease less deadly and left hospitals much better able to cope, aren't really things in Europe or the UK and I was amazed to learn people in the US are claiming things that insane, but we have our own fools demanding pointless sacrifices - witness the recent ban Wales put on buying 'nonessential goods' within supermarkets)

If by 'giving up' you mean 'not changing the government mandated measures currently on offer to be more like a lockdown', given the situation France is in right now, it seems undeniably the wrong thing to do to rely on voluntary behaviour changes and hope that there's no spike that overwhelms hospitals (again, asymmetric risk!) - worse for the economy, lives and certainly for other knock-on effects like hospital overloading. A lot of estimations of the marginal cost of suppression measures completely miss the point that the costs and benefits just don't separate out neatly, as I argue here. Tyler Cowen:

I think back to when I was 12 or 13, and asked to play the Avalon Hill board game Blitzkrieg.  Now, as the name might indicate, you win Blitzkrieg by being very aggressive.  My first real game was with a guy named Tim Rice, at the Westwood Chess Club, and he just crushed me, literally blitzing me off the board.  I had made the mistake of approaching Blitzkrieg like chess, setting up my forces for various future careful maneuvers.  I was back on my heels before I knew what had happened.

Due to its potential for exponential growth, Covid-19 is more like Blitzkrieg than it is like chess.  You are either winning or losing (badly), and you would prefer to be winning.  A good response is about trying to leap over into that winning space, and then staying there.  If you find that current prevention is failing a cost-benefit test, that doesn’t mean the answer is less prevention, which might fail a cost-benefit test all the more, due to the power of the non-local virus multiplication properties to shut down your economy and also take lives and instill fear.

You still need to come up with a way of beating Covid back.

'Giving up' is not actually giving up. At least in Europe, given the state of public behaviour and opinion about the virus, 'giving up' just means Sweden's 'voluntary suppression' in practice. There is no outcome where we uniformly line up to variolate ourselves and smoothly approach herd immunity. The people who try to work out the costs and benefits of 'lockdowns' are making a meaningless false comparison between 'normal economy' and 'lockdown:

First and foremost, the declaration does not present the most important point right now, which is to say October 2020: By the middle of next year, and quite possibly sooner, the world will be in a much better position to combat Covid-19. The arrival of some mix of vaccines and therapeutics will improve the situation, so it makes sense to shift cases and infection risks into the future while being somewhat protective now. To allow large numbers of people today to die of Covid, in wealthy countries, is akin to charging the hill and taking casualties two days before the end of World War I.

...

What exactly does the word “allow” mean in this context? Again the passivity is evident, as if humans should just line up in the proper order of virus exposure and submit to nature’s will. How about instead we channel our inner Ayn Rand and stress the role of human agency? Something like: “Herd immunity will come from a combination of exposure to the virus through natural infection and the widespread use of vaccines. Here are some ways to maximize the role of vaccines in that process.”>In that sense, as things stand, there is no “normal” to be found. An attempt to pursue it would most likely lead to panic over the numbers of cases and hospitalizations, and would almost certainly make a second lockdown more likely. There is no ideal of liberty at the end of the tunnel here.

In Europe, we will have more lockdowns. I'm not making the claim that this is what we should do, or that this is what's best for the economy given the dreadful situation we've landed ourselves in, or that's what we'll almost certainly end up doing given political realities - though I think these are all true. What I'm saying is that, whether (almost certainly) by governments caving to political pressure or (if they hold out endlessly like Sweden) by voluntary behaviour change, we'll shut down the economy in an attempt to avoid catching the virus. Anything else is inconceivable and requires lemming-like behaviour from politicians and ordinary people.

So, given that it's going to happen, would you rather it be chaotic and late and uncoordinated, or sharper and earlier and hopefully shorter? If we're talking about government policy, there really isn't all that much compromise on the marginal costs of lockdowns vs the economy to be had if you're currently in the middle of a sufficiently rapid acceleration.

Comment by sdm on "Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} · 2020-10-29T18:58:56.093Z · LW · GW

I'm still a bit puzzled by the link between human level on text prediction and 'human level' unconditionally - if I recall our near-bet during the forecasting tournament, our major disagreement was on whether direct scaling of GPT like systems takes us near to AGI. I often think that (because we don't have direct experience with any verbal intelligences in capability between GPT-3 and human brains) we're often impoverished when trying to think about such intelligences. I imagine that a GPT-6 that is almost 'human level on text prediction' could still be extremely deficient in other areas - it would be very weird to converse with, maybe like an amnesiac or confabulator that's very articulate and with good short-term memory.

If language models scale to near-human performance but the other milestones don't fall in the process, and my initial claim is right, that gives us very transformative AI but not AGI. I think that the situation would look something like this:

If GPT-N reaches par-human:

discovering new action sets

managing its own mental activity

(?) cumulative learning

 

human-like language comprehension

perception and object recognition

efficient search over known facts

So there would be 2 (maybe 3?) breakthroughs remaining. It seems like you think just scaling up a GPT will also resolve those other milestones, rather than just giving us human-like language comprehension. Whereas if I'm right and also those curves do extrapolate, what we would get at the end would be an excellent text generator, but it wouldn't be an agent, wouldn't be capable of long-term planning and couldn't be accurately described as having a utility function over the states of the external world, and I don't see any reason why trivial extensions of GPT would be able to do that either since those seem like problems that are just as hard as human-like language comprehension. GPT seems like it's also making some progress on cumulative learning, though it might need some RL-based help with that, but none at all on managing mental activity for longterm planning or discovering new action sets.

Comment by sdm on Security Mindset and Takeoff Speeds · 2020-10-29T18:47:16.938Z · LW · GW

In terms of inferences about deceptive alignment, it might be useful to go back to the one and only current example we have where someone with somewhat relevant knowledge was led to wonder whether deception had taken place - GPT-3 balancing brackets. I don't know if anyone ever got Eliezer's $1000 bounty, but the top-level comment on that thread at least convinces me that it's unlikely that GPT-3 via AI Dungeon was being deceptive even though Eliezer thought there was a real possibility that it was.

Now, this doesn't prove all that much, but one thing it does suggest is that on current MIRI-like views about how likely deception is, the threshold for uncertainty about deception is set far too low. That suggests your people at OpenSoft might well be right in their assumption.

Comment by sdm on Have the lockdowns been worth it? · 2020-10-20T13:45:48.275Z · LW · GW
  • How long do we expect to have to wait for a vaccine or much more effective treatment? 

I can't think of a better source on this than the Good Judgment project's COVID-19 recovery dashboard.

  • How does the economic and related damage vary for voluntary vs involuntary suppression?

This is incredibly complicated and country-specific and dependent on all sorts of factors but maybe this graph from the Financial Times is a good place to start, it tells us how things have gone so far.

  • How does the total number and spread of infections vary for voluntary vs involuntary suppression? 

This is even harder than the previous question. 'All we can say for sure is "It was possible to get R<1 in Sweden in the spring with less stringent measures'. If you consider that Sweden suffered considerably more death than its comparable neighbours, then you can project that the initial surge in deaths in badly-hit locked down countries like the UK could have been much higher with voluntary measures, but how much higher is difficult to assess. I think that between-country comparisons are almost useless in these situations.

This is also where accounting for coronavirus deaths and debilitations comes into play. 'Anti-lockdown' arguments sometimes focus on the fact that even in badly-hit countries, the excess death figures have been in the rough range of +10%, (though with around 11 years of life lost). There are ways of describing this that make it seem 'not so bad' or 'not worth shutting the country down for', by e.g. comparing it to deaths from the other leading causes of death, like heart disease. This assumes there's a direct tradeoff where we can 'carry on as normal' while accepting those deaths and avoid the economic damage, but there is no such tradeoff to be made. There's just the choice as to which way you place the additional nudges of law and public messaging on top of a trajectory you're largely committed to by individual behaviour changes.

And if you do try to make the impossible, hypothetical 'tradeoff economy and lives' comparison between 'normal behaviour no matter what' and virus suppression, then the number of excess deaths to use for comparison isn't the number we in fact suffered, but far higher, given the IFR of 0.5-1%, it's on the order of +100% excess deaths (600,000 in the UK and 2 million in the US).

But again, such a comparison isn't useful, as it's not a policy that could be enacted or adopted, in fact it would probably require huge state coercion to force people to return to 'normal life'.

The basic point that it wouldn't be worth sacrificing everything to reduce excess deaths by 10% and save a million life-years is true, but that point is turned into a motte-and-bailey, where the motte is that there exists a level of damage at which a particular suppression measure (full lockdowns) is no longer worth it, and the bailey is that in all the situations we are in now most suppression measures are not worth it.

  • To what degree do weaker legally mandated measures earlier spare us from stronger legally mandated measures (or greater economic damage from voluntary behaviour change) later?

This raises the difficult question of how much to take into account panic over overwhelmed hospitals and rising cases. Tyler Cowen:

In that sense, as things stand, there is no “normal” to be found. An attempt to pursue it would most likely lead to panic over the numbers of cases and hospitalizations, and would almost certainly make a second lockdown more likely.

Comment by sdm on The Treacherous Path to Rationality · 2020-10-19T17:07:22.990Z · LW · GW

The Rationality community was never particularly focused on medicine or epidemiology. And yet, we basically got everything about COVID-19 right and did so months ahead of the majority of government officials, journalists, and supposed experts.

...

We started discussing the virus and raising the alarm in private back in January. By late February, as American health officials were almost unanimously downplaying the threat, we wrote posts on taking the disease seriously, buying masks, and preparing for quarantine.

...

The rationalists pwned COVID

This isn't true. We did see it coming more clearly than most of the governmental authorities and certainly were ahead of public risk communication, but we were on average fairly similar or even a bit behind the actual domain experts.

This article summarizes interviews with epidemiologists on when they first realized COVID-19 was going to be a huge catastrophe and how they reacted. The dates range from January 15th with the majority in mid-late February. See also this tweet from late February, from a modeller working of the UK's SAGE, confirming he thinks uncontrolled spread is taking place.

I have an email dated 27 Feb 2020 replying to a colleague: "My thoughts on Covid-19 - pandemic is very likely." It was such a dry, intellectual statement, and I remember feeling incredulous that I could write those words with such ease and certainty while feeling total uncertainty and fear about how this could play out.

...

Two moments stand out for me. One was in the first week of February, when I saw early signals that there could be substantial transmission before people show symptoms. Despite hopes of rapid containment, it was clear contact tracing alone would struggle to contain outbreaks

...

On January 23, I was at an NIH meeting related to potential pandemic pathogen research. Everyone had heard the news [that Wuhan had been locked down] and was beginning to discuss whether this would be a big deal. Over the next several weeks the concern turned more grave.

I believe February 27th was the same day as 'Seeing the Smoke', when it became accepted wisdom around here that coronavirus would be a huge catastrophe. Feb 27th was a day before I said I thought this would be a test-run for existential risk. And late January, we were in the same position as the NIH of 'beginning to discuss whether this would be a big deal' without certainty. The crucial difference was understanding the asymmetric risk - A failure, but not of prediction.

So why didn't the domain experts do anything if so? I've been reading the book Rage by Bob Woodward which includes interviews with Fauci and other US officials from January and February. There was a constant emphasis on how demanding strict measures early would be 'useless' and achieve nothing from as early as the end of December!

I'm growing to think that a lot of health experts had an implicit understanding that the systems around them in the west were not equipped to carry out their best plans of action. In other words, they saw the smoke under the door, decided that if they yelled 'fire' before it had filled up the room nobody would believe them and then decided to wait a bit before yelling 'fire'. But since we weren't trying to produce government policy, we weren't subject to the same limitations.

Comment by sdm on Have the lockdowns been worth it? · 2020-10-19T16:43:04.491Z · LW · GW

An important consideration is that the 'thing that the US, UK and China have been doing, and what Sweden didn’t', may not refer to anything. There are two meanings of 'lockdowns have not been worth it' - 'allow the natural herd immunity to happen and carry on as normal, accepting the direct health damage while saving the economy' or 'we shouldn't adopt legally mandatory measures to attempt to suppress the virus and instead adopt voluntary measures to attempt to suppress the virus'. The latter of these is the only correct way to interpret 'thing Sweden did that the other countries didn't'. The first of these is basically a thought-experiment, not a possible state of affairs, because people won't carry on as usual. So it can't be used for cost-benefit comparisons.

In terms of behaviour, there is far more similarity between what the US and Sweden 'did' than what the US and China 'did'. Tyler Cowen has written several articles emphasising exactly this point. What Sweden 'did' was an uncoordinated, voluntary attempt at the same policy that China, Germany, the UK and the US attempted with varying levels of seriousness - social distancing to reduce the R effectively below 1, suppressing the epidemic. This thread summarizes the 'voluntary suppression' that countries like Sweden ended up with. Tyler Cowen writes an article attempting to 'right the wrong question':

"The most compassionate approach that balances the risks and benefits of reaching herd immunity, is to allow those who are at minimal risk of death to live their lives normally to build up immunity to the virus through natural infection, while better protecting those who are at highest risk. We call this Focused Protection."

What exactly does the word “allow” mean in this context? Again the passivity is evident, as if humans should just line up in the proper order of virus exposure and submit to nature’s will. How about instead we channel our inner Ayn Rand and stress the role of human agency? Something like: “Herd immunity will come from a combination of exposure to the virus through natural infection and the widespread use of vaccines. Here are some ways to maximize the role of vaccines in that process.”

So, the question cannot be "should we allow the natural herd immunity to happen and carry on as normal, accepting the direct health damage while protecting the economy" - that is not actually a possible state of affairs given human behaviour. We can ask whether a better overall outcome is achieved with legally required measures to attempt suppression, rather than an uncoordinated attempt at suppression, but since people will not carry on as normal we can't ask 'has the economic/knock-on cost of lockdowns been worth the lives saved' without being very clear that the counterfactual may not be all that different.

The most important considerations have to be,

  • How long do we expect to have to wait for a vaccine or much more effective treatment? If not long, then any weaker suppression is 'akin to charging the hill and taking casualties two days before the end of World War I'. If a long time, then we must recognise that in e.g. the US given that a slow grind up to herd immunity through infection will eventually occur.
  • How does the economic and related damage vary for voluntary vs involuntary suppression? The example of Sweden compared to its neighbours is illustrative here.
  • How does the total number and spread of infections vary for voluntary vs involuntary suppression? You can't rerun history for a given country with vs without legally mandated suppression measures.
  • To what degree do weaker legally mandated measures earlier spare us from stronger legally mandated measures (or greater economic damage from voluntary behaviour change) later?
  • Edit: Tyler Cowen released another article arguing for a new consideration that I didn't list - what reference class to place Coronavirus in - 'external attack on the nation' or 'regular cause of death'. Since, for fairly clear rule-utilitarian/deontological reasons, governments should care more about defending their citizens from e.g. wars and terrorist attacks compared to random accidents that kill similar numbers of people. I also think this is a key disagreement between pro/anti-'lockdown' positions.

To emphasise this last point, although it falls under 'questioning the question', the focus on Lockdowns can be counterproductive when there are vastly more cost-effective measures that could have been attempted by countries like the UK that had very low caseloads through the summer - like funding enforcement and support for isolation and better contact tracing, mask enforcement, and keeping events outdoors. These may fall under some people's definition of 'lockdown' since some of them are legally mandatory social distancing, but their costs and benefits are wildly different from stay-at-home orders. Scepticism of 'Lockdowns' must be defined to be more specific.

Comment by sdm on Covid 10/15: Playtime is Over · 2020-10-17T15:13:25.876Z · LW · GW

The other group claims their goal is to save lives while preventing economic disaster. In practice, they act as if their goal was to destroy as much economic and social value as possible in the name of the pandemic as a Sacrifice to the Gods, and to pile maximum blame upon those who do not go along with this plan, while doing their best to slow down or block solutions that might solve the pandemic without sufficiently destroying economic or social value.

There are less cynical ways to view countermeasures that go too far. I'd compare it, especially early on, to many of us developing mild OCD because of how terrifying things were - compliance was also very high early on.

they act as if their goal was to destroy as much economic and social value as possible in the name of the pandemic as a Sacrifice to the Gods

...

they act as if their goal was to have everyone ignore the pandemic, actively flouting all precautions

A lot of the response in Europe/UK has not looked like this, or like your opposite side but it still hasn't been very good.

The UK/Europe response been more like an inefficient, clumsy attempt to strike a 'balance' between mitigation and saving the economy, while showing no understanding of how to make good tradeoffs - e.g opening the universities while banning small gatherings. It looks more like an attempt to do all the 'good' things at once for the economy and health and get the reputational/mood affiliation benefits from both. E.g. in the UK in summer we half-funded the tracing and isolation infrastructure, ignored that compliance was low and gave subsides to people eating out at pubs and restaurants after suppressing the virus hard and at great cost, and now might be employing incredibly costly lockdown measures again when we could have fully squashed with a bit of extra effort in the summer when numbers were almost zero - and that's the same story as most of Europe.

That's more a failure to understand/respond to opportunity costs than either of the failures you describe, though it has aspects of both. It doesn't look like they were acting with the goal of getting people to adhere to the costliest measures possible, though - witness the reluctance to reimpose restrictions now.

The pandemic has enough physical-world, simulacra-level-1 impact on people to steer most ordinary people’s individual physical actions towards what seems to them like useful ones that preserve economic and social value while minimizing health risks. And it manages to impose some **amount of similar restrictions on the collective and rhetorical actions. **

This is the part that I like to emphasise, and the reason that we're still bound for a better outcome than most March predictions implied is because of a decent level of public awareness of risk imposing a brake on the very worst outcomes - the Morituri Nolumus Mori. Many of us didn't properly anticipate how much physical reality would end up hemming in our actions, as I explained in that post.

That doesn’t mean equivalence between sides, let alone equivalence of individuals. But until the basic dynamics are understood, one can’t reasonably predict what will happen next.

This is also worth emphasising. In general, though not in the examples you mention from e.g. California, going too hard works better than going too soft because there just is no pure 'let it rip' option - there's a choice between coordinated and uncoordinated suppression. It looks like voluntary behaviour has (in Europe and the US) mattered relatively more than expected. Countries that relied on voluntary behaviour change like Sweden didn't have the feared uncontrolled spread but also didn't do that well - they ended up with a policy of effective ‘voluntary suppression’ with a slightly different tradeoff – economic damage slightly less than others, activity reduction slower and more chaotic, more deaths. This was essentially a collective choice by the Swedish people despite their government.

that’s probably not true, and probably not true sooner rather than later. Immunity and testing continue to increase, our treatments continue to improve, and vaccines are probably on their way on a timescale of months. Despite the best efforts of both camps, it would greatly surprise me if we are not past the halfway point.

The initial estimates said that 40-50% infected is a reasonable lower bound for when weak mitigation plus partial herd immunity would end the pandemic naturally. I think that's still true. So, it would all have been 'worth it', in pure death terms, if significantly fewer than that many people end up catching coronavirus before much better treatments or vaccines end the epidemic by other means. Last time I checked that's still likely.

Comment by sdm on A voting theory primer for rationalists · 2020-08-31T16:51:54.777Z · LW · GW
You seem to be comparing Arrow's theorem to Lord Vetinari, implying that both are undisputed sovereigns?

It was a joke about how if you take Arrow's theorem literally, the fairest 'voting method' (at least among ranked voting methods), the only rule which produces a definite transitive preference ranking and which meets the unanimity and independence conditions is 'one man, one vote', i.e. dictatorship.

And frankly, I think that the model used in the paper bears very little relationship to any political reality I know of. I've never seen a group of voters who believe "I would love it if any two of these three laws pass, but I would hate it if all three of them passed or none of them passed" for any set of laws that are seriously proposed and argued-for.

Such a situation doesn't seem all that far-fetched to me - suppose there are three different stimulus bills on offer, and you want some stimulus spending but you also care about rising national debt. You might not care which bills pass, but you still want some stimulus money, but you also don't want all of them to pass because you think the debt would rise too high, so maybe you decide that you just want any 2 out of 3 of them to pass. But I think the methods introduced in that paper might be most useful not to model the outcomes of voting systems, but for attempts to align an AI to multiple people's preferences.

Comment by sdm on Forecasting Thread: AI Timelines · 2020-08-29T11:11:04.240Z · LW · GW

I'll take that bet! If I do lose, I'll be far too excited/terrified/dead to worry in any case.

Comment by sdm on Covid 8/27: The Fall of the CDC · 2020-08-28T11:32:12.947Z · LW · GW
I’m still periodically scared in an existential or civilization-is-collapsing-in-general kind of way, but not in a ‘the economy is about to collapse’ or ‘millions of Americans are about to die’ kind of way. 
I’m not sure whether this is progress.

It definitely is progress. If we were in the latter situation, there would be nothing at all to do except hope you personally don't die, whereas in the former there's a chance for things to get better - if we learn the lesson.

By strange coincidence, it's exactly 6 months since I wrote this, and I think it's important to remember just how dire the subjective future seemed at the end of February - that (subjectively, anyway) could have happened, but didn't.

Comment by sdm on SDM's Shortform · 2020-08-28T10:50:18.165Z · LW · GW
The tl;dr is that instead of thinking of ethics as a single unified domain where "population ethics" is just a straightforward extension of "normal ethics," you split "ethics" into a bunch of different subcategories:
Preference utilitarianism as an underdetermined but universal morality
"What is my life goal?" as the existentialist question we have to answer for why we get up in the morning
"What's a particularly moral or altruistic thing to do with the future lightcone?" as an optional subquestion of "What is my life goal?" – of interest to people who want to make their life goals particularly altruistically meaningful

This is very interesting - I recall from our earlier conversation that you said you might expect some areas of agreement, just not on axiology:

(I say elements because realism is not all-or-nothing - there could be an objective 'core' to ethics, maybe axiology, and much ethics could be built on top of such a realist core - that even seems like the most natural reading of the evidence, if the evidence is that there is convergence only on a limited subset of questions.)

I also agree with that, except that I think axiology is the one place where I'm most confident that there's no convergence. :)
Maybe my anti-realism is best described as "some moral facts exist (in a weak sense as far as other realist proposals go), but morality is underdetermined."

This may seem like an odd question, but, are you possibly a normative realist, just not a full-fledged moral realist? What I didn't say in that bracket was that 'maybe axiology' wasn't my only guess about what the objective, normative facts at the core of ethics could be.

Following Singer in the expanding circle, I also think that some impartiality rule that leads to preference utilitarianism, maybe analogous to the anonymity rule in social choice, could be one of the normatively correct rules that ethics has to follow, but that if convergence among ethical views doesn't occur the final answer might be underdetermined. This seems to be exactly the same as your view, so maybe we disagree less than it initially seemed.


In my attempted classification (of whether you accept convergence and/or irreducible normativity), I think you'd be somewhere between 1 and 3. I did say that those views might be on a spectrum depending on which areas of Normativity overall you accept, but I didn't consider splitting up ethics into specific subdomains, each of which might have convergence or not:

Depending on which of the arguments you accept, there are four basic options. These are extremes of a spectrum, as while the Normativity argument is all-or-nothing, the Convergence argument can come by degrees for different types of normative claims (epistemic, practical and moral)

Assuming that it is possible to cleanly separate population ethics from 'preference utilitarianism', it is consistent, though quite counterintuitive, to demand reflective coherence in our non-population ethical views but allow whatever we want in population ethics (this would be view 1 for most ethics but view 3 for population ethics).

(This still strikes me as exactly what we'd expect to see halfway to reaching convergence - the weirder and newer subdomain of ethics still has no agreement, while we have reached greater agreement on questions we've been working on for longer.)

It sounds like your contrasting my statement from The Case for SFE ("fit all one’s moral intuitions into an overarching theory based solely on intuitively appealing axioms") with "arbitrarily halting the search for coherence" / giving up on ethics playing a role in decision-making. But those are not the only two options: You can have some universal moral principles, but leave a lot of population ethics underdetermined.

Your case for SFE was intended to defend a view of population ethics - that there is an asymmetry between suffering and happiness. If we've decided that 'population ethics' is to remain undetermined, that is we adopt view 3 for population ethics, what is your argument (that SFE is an intuitively appealing explanation for many of our moral intuitions) meant to achieve? Can't I simply declare that my intuitions say different, and then we have nothing more to discuss, if we already know we're going to leave population ethics undetermined?

Comment by sdm on Forecasting Thread: AI Timelines · 2020-08-26T14:35:28.173Z · LW · GW

The 'progress will be continuous' argument, to apply to our near future, does depend on my other assumptions - mainly that the breakthroughs on that list are separable, so agentive behaviour and long-term planning won't drop out of a larger GPT by themselves and can't be considered part of just 'improving up language model accuracy'.

We currently have partial progress on human-level language comprehension, a bit on cumulative learning, but near zero on managing mental activity for long term planning, so if we were to suddenly reach human level on long-term planning in the next 5 years, that would probably involve a discontinuity, which I don't think is very likely for the reasons given here.

If language models scale to near-human performance but the other milestones don't fall in the process, and my initial claim is right, that gives us very transformative AI but not AGI. I think that the situation would look something like this:

If GPT-N reaches par-human:

discovering new action sets
managing its own mental activity
(?) cumulative learning
human-like language comprehension
perception and object recognition
efficient search over known facts

So there would be 2 (maybe 3?) breakthroughs remaining. It seems like you think just scaling up a GPT will also resolve those other milestones, rather than just giving us human-like language comprehension. Whereas if I'm right and also those curves do extrapolate, what we would get at the end would be an excellent text generator, but it wouldn't be an agent, wouldn't be capable of long-term planning and couldn't be accurately described as having a utility function over the states of the external world, and I don't see any reason why trivial extensions of GPT would be able to do that either since those seem like problems that are just as hard as human-like language comprehension. GPT seems like it's also making some progress on cumulative learning, though it might need some RL-based help with that, but none at all on managing mental activity for longterm planning or discovering new action sets.

As an additional argument, admittedly from authority - Stuart Russell also clearly sees human-like language comprehension as only one of several really hard and independent problems that need to be solved.

A humanlike GPT-N would certainly be a huge leap into a realm of AI we don't know much about, so we could be surprised and discover that agentive behaviour and having a utility function over states of the external world spontaneously appears in a good enough language model, but that argument has to be made, and you need that argument to hold and GPT to keep scaling for us to reach AGI in the next five years, and I don't see the conjunction of those two as that likely - it seems as though your argument rests solely on whether GPT scales or not, when there's also this other conceptual premise that's much harder to justify.

I'm also not sure if I've seen anyone make the argument that GPT-N will also give us these specific breakthroughs - but if you have reasons that GPT scaling would solve all the remaining barriers to AGI, I'd be interested to hear it. Note that this isn't the same as just pointing out how impressive the results scaling up GPT could be - Gwern's piece here, for example, seems to be arguing for a scenario more like what I've envisaged, where GPT-N ends up a key piece of some future AGI but just provides some of the background 'world model':

Models like GPT-3 suggest that large unsupervised models will be vital components of future DL systems, as they can be ‘plugged into’ systems to immediately provide understanding of the world, humans, natural language, and reasoning.

If GPT does scale, and we get human-like language comprehension in 2025, that will mean we're moving up that list much faster, and in turn suggests that there might not be a large number of additional discoveries required to make the other breakthroughs, which in turn suggests they might also occur within the Deep Learning paradigm, and relatively soon. I think that if this happens, there's a reasonable chance that when we do build an AGI a big part of its internals looks like a GPT, as gwern suggested, but by then we're already long past simply scaling up existing systems.

Alternatively, perhaps you're not including agentive behaviour in your definition of AGI - a par-human text generator for most tasks that isn't capable of discovering new action sets or managing its mental activity is, I think a 'mere' transformative AI and not a genuine AGI.

Comment by sdm on SDM's Shortform · 2020-08-25T15:56:57.852Z · LW · GW

So to sum up, a very high-level summary of the steps in this method of preference elicitation and aggregation would be:

    1. With a mixture of normative assumptions and multi-channel information (approval and actions) as inputs, use a reward-modelling method to elicit the debiased preferences of many individuals.
      1. Determining whether there actually are significant differences between stated and revealed preferences when performing reward modelling is the first step to using multi-channel information to effectively separate biases from preferences.
    2. Create 'proxy agents' using the reward model developed for each human (this step is where intent-aligned amplification can potentially occur).
    3. Place the proxies in an iterated voting situation which tends to produce sensible convergent results. The use of RL proxies here can be compared to the use of human proxies in liquid democracy.
      1. Which voting mechanisms tend to work in iterated situations with RL agents can be determined in other experiments (probably with purely artificial agents)
    4. Run the voting mechanism until an unambiguous winner is decided, using methods like those given in this paper.

This seems like a reasonable procedure for extending a method that is aligned to one human's preferences (step 1,2) to produce sensible results when trying to align to an aggregate of human preferences (step 3,4). It reduces reliance on the specific features of one voting method, Other than the insight that multiple channels of information might help, all the standard unsolved problems with preference learning from one human remain.

Even though we can't yet align an AGI to one human's preferences, trying to think about how to aggregate human preferences in a way that is scalable isn't premature, as has sometimes been claimed.

In many 'non-ambitious' hypothetical settings where we aren't trying to build an AGI sovereign over the whole world (for example, designing a powerful AI to govern the operations of a hospital), we still need to be able to aggregate preferences sensibly and stably. This method would do well at such intermediate scales, as it doesn't approach the question of preference aggregation from a 'final' ambitious value-learning perspective but instead tries to look at preference aggregation the same way we look at elicitation, with an RL-based iterative approach to reaching a result.

However, if you did want to use such a method to try and produce the fabled 'final utility function of all humanity', it might not give you Humanity's CEV, since some normative assumptions (preferences count equally and in the way given by the voting mechanism), are built in. By analogy with CEV, I called the idealized result of this method a coherent extrapolated framework (CEF). This is a more normatively direct method of aggregating values than CEV, (since you fix a particular method of aggregating preferences in advance), as it extrapolates from a voting framework rather than extrapolating based on our volition, more broadly (and vaguely) defined, hence CEF.

Comment by sdm on A voting theory primer for rationalists · 2020-08-25T13:00:09.261Z · LW · GW
Kenneth Arrow, proved that the problem that Condorcet (and Llull) had seen was in fact a fundamental issue with any ranked voting method. He posed 3 basic "fairness criteria" and showed that no ranked method can meet all of them:
Ranked unanimity, Independence of irrelevant alternatives, Non-dictatorial

I've been reading up on voting theory recently and Arrow's result - that the only voting system which produces a definite transitive preference ranking, that will pick the unanimous answer if one exists, and doesn't change depending on irrelevant alternatives - is 'one man, one vote'.

“Ankh-Morpork had dallied with many forms of government and had ended up with that form of democracy known as One Man, One Vote. The Patrician was the Man; he had the Vote.”

In my opinion, aside from the utilitarian perspective offered by VSE, the key to evaluating voting methods is an understanding of strategic voting; this is what I'd call the "mechanism design" perspective. I'd say that there are 5 common "anti-patterns" that voting methods can fall into; either where voting strategy can lead to pathological results, or vice versa.

One recent extension to these statistical approaches is to use RL agents in iterated voting and examine their convergence behaviour. The idea is that we embrace the inevitable impossibility results (such as Arrow and GS theorems) and consider agents' ability to vote strategically as an opportunity to reach stable outcomes. This paper uses very simple Q-learning agents with a few different policies - epsilon-greedy, greedy and upper confidence bound, in an iterated voting game, and gets behaviour that seems sensible. Many thousands of rounds of iterated voting isn't practical for real-world elections, but for preference elicitation in other contexts (such as value learning) it might be useful as a way to try and estimate people's underlying utilities as accurately as possible.

Comment by sdm on Open & Welcome Thread - August 2020 · 2020-08-24T14:14:25.444Z · LW · GW

A first actually credible claim of coronavirus reinfection? Potentially good news as the patient was asymptomatic and rapidly produced a strong antibody response.

Comment by sdm on Forecasting Thread: AI Timelines · 2020-08-23T16:32:34.429Z · LW · GW

Here's my answer. I'm pretty uncertain compared to some of the others!

AI Forecast

First, I'm assuming that by AGI we mean an agent-like entity that can do the things associated with general intelligence, including things like planning towards a goal and carrying that out. If we end up in a CAIS-like world where there is some AI service or other that can do most economically useful tasks, but nothing with very broad competence, I count that as never developing AGI.

I've been impressed with GPT-3, and could imagine it or something like it scaling to produce near-human level responses to language prompts in a few years, especially with RL-based extensions.

But, following the list (below) of missing capabilities by Stuart Russell, I still think things like long-term planning would elude GPT-N, so it wouldn't be agentive general intelligence. Even though you might get those behaviours with trivial extensions of GPT-N, I don't think it's very likely.

That's why I think AGI before 2025 is very unlikely (not enough time for anything except scaling up of existing methods). This is also because I tend to expect progress to be continuous, though potentially quite fast, and going from current AI to AGI in less than 5 years requires a very sharp discontinuity.

AGI before 2035 or so happens if systems quite a lot like current deep learning can do the job, but which aren't just trivial extensions of them - this seems reasonable to me on the inside view - e.g. it takes us less than 15 years to take GPT-N and add layers on top of it that handle things like planning and discovering new actions. This is probably my 'inside view' answer.

I put a lot of weight on a tail peaking around 2050 because of how quickly we've advanced up this 'list of breakthroughs needed for general intelligence' -

There is this list of remaining capabilities needed for AGI in an older post I wrote, with the capabilities of 'GPT-6' as I see them underlined:

Stuart Russell’s List

human-like language comprehension

cumulative learning

discovering new action sets

managing its own mental activity

For reference, I’ve included two capabilities we already have that I imagine being on a similar list in 1960

perception and object recognition

efficient search over known facts

So we'd have discovering new action sets, and managing mental activity - effectively, the things that facilitate long-range complex planning, remaining.

So (very oversimplified) if around the 1980s we had efficient search algorithms, by 2015 we had image recognition (basic perception) and by 2025 we have language comprehension courtesy of GPT-8, that leaves cumulative learning (which could be obtained by advanced RL?), then discovering new action sets and managing mental activity (no idea). It feels a bit odd that we'd breeze past all the remaining milestones in one decade after it took ~6 to get to where we are now. Say progress has sped up to be twice as fast, then it's 3 more decades to go. Add to this the economic evidence from things like Modelling the Human Trajectory, which suggests a roughly similar time period of around 2050.

Finally, I think it's unlikely but not impossible that we never build AGI and instead go for tool AI or CAIS, most likely because we've misunderstood the incentives such that it isn't actually economical or agentive behaviour doesn't arise easily. Then there's the small (few percent) chance of catastrophic or existential disaster which wrecks our ability to invent things. This is the one I'm most unsure about - I put 15% for both but it may well be higher.

Comment by sdm on SDM's Shortform · 2020-08-23T15:57:40.177Z · LW · GW

I don't think that excuse works in this case - I didn't give it a 'long-winded frame', just that brief sentence at the start, and then the list of scenarios, and even though I reran it a couple of times on each to check, the 'cranberry/grape juice kills you' outcome never arose.

So, perhaps they switched directly from no prompt to an incredibly long-winded and specific prompt without checking what was actually necessary for a good answer? I'll point out didn't really attempt any sophisticated prompt programming either - that was literally the first sentence I thought of!

Comment by sdm on SDM's Shortform · 2020-08-23T12:31:37.767Z · LW · GW

Gary Marcus, noted sceptic of Deep Learning, wrote an article with Ernest Davis:

GPT-3, Bloviator: OpenAI’s language has no idea what it’s talking about

The article purports to give six examples of GPT-3's failure - Biological, Physical, Social, Object and Psychological reasoning and 'non sequiturs'. Leaving aside that GPT-3 works on Gary's earlier GPT-2 failure examples, and that it seems as though he specifically searched out weak points by testing GPT-3 on many more examples than were given, something a bit odd is going on with the results they gave. I got better results when running his prompts on AI Dungeon.

With no reruns, randomness = 0.5, I gave Gary's questions (all six gave answers considered 'failures' by Gary) to GPT-3 via AI Dungeon with a short scene-setting prompt, and got good answers to 4 of them, and reasonable vague answers to the other 2:

This is a series of scenarios describing a human taking actions in the world, designed to test physical and common-sense reasoning.
1) You poured yourself a glass of cranberry juice, but then you absentmindedly poured about a teaspoon of grape juice into it. It looks okay. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you take another drink.
2) You are having a small dinner party. You want to serve dinner in the living room. The dining room table is wider than the doorway, so to get it into the living room, you will have to  move furniture. This means that some people will be inconvenienced.
3) You are a defense lawyer and you have to go to court today. Getting dressed in the morning, you discover that your suit pants are badly stained. However, your bathing suit is clean and very stylish. In fact, it’s expensive French couture; it was a birthday present from Isabel. You decide that you should wear it because you won't look professional in your stained pants, but you are worried that the judge will think you aren't taking the case seriously if you are wearing a bathing suit.
4) Yesterday I dropped my clothes off at the dry cleaner’s and I have yet to pick them up. Where are my clothes?
5) Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a top.” “Don’t get Jack a top,” says Penny. “He has a top. He will prefer a bottom."
6) At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a cigarette. But that turned out to be a bad idea because it was a menthol, and it ruined the taste. So I added a little more sugar to counteract the menthol, and then I noticed that my cigarette had fallen into the glass and was floating in the lemonade.

For 1), Gary's example ended with 'you are now dead' - for 1), I got a reasonable, if short continuation - success.

2) - the answer is vague enough to be a technically correct solution, 'move furniture' = tilt the table, but since we're being strict I'll count it as a failure. Gary's example was a convoluted attempt to saw the door in half, clearly mistaken.

3) is very obviously intended to trick the AI into endorsing the bathing suit answer, in fact it feels like a classic priming trick that might trip up a human! But in my version GPT-3 rebels against the attempt and notices the incongruence of wearing a bathing suit to court, so it counts as a success. Gary's example didn't include the worry that a bathing suit was inappropriate - arguably not a failure, but nevermind, let's move on.

4) is actually a complete prompt by itself, so the AI didn't do anything - GPT-3 doesn't care about answering questions, just continuing text with high probability. Gary's answer was 'I have a lot of clothes', and no doubt he'd call both 'evasion', so to be strict we'll agree with him and count that as failure.

5) Trousers are called 'bottoms' so that's right. Gary would call it wrong since 'the intended continuation' was “He will make you take it back", but that's absurdly unfair, that's not the only answer a human being might give, so I have to say it's correct. Gary's example ' lost track of the fact that Penny is advising Janet against getting a top', which didn't happen here, so that's acceptable.

Lastly, 6) is a slightly bizarre but logical continuation of an intentionally weird prompt - so correct. It also demonstrates correct physical reasoning - stirring a drink with a cigarette won't be good for the taste. Gary's answer wandered off-topic and started talking about cremation.

So, 4/6 correct on an intentionally deceptive and adversarial set of prompts, and that's on a fairly strict definition of correct. 2) and 4) are arguably not wrong, even if evasive and vague. More to the point, this was on an inferior version of GPT-3 to the one Gary used, the Dragon model from AI Dungeon!

I'm not sure what's going on here - is it the initial prompt saying it was 'testing physical and common sense reasoning'? Was that all it took?

Comment by sdm on Learning human preferences: optimistic and pessimistic scenarios · 2020-08-21T16:40:21.362Z · LW · GW

Glad you think so! I think that methods like using multiple information sources might be a useful way to reduce the number of (potentially mistaken) normative assumptions you need in order to model a single human's preferences.

The other area of human preference learning where you seem, inevitably, to need a lot of strong normative assumptions is in preference aggregation. If we assume we have elicited the preferences of lots of individual humans, and we're then trying to aggregate their preferences (with each human's preference represented by a separate model) I think the same basic principle applies, that you can reduce the normative assumptions you need by using a more complicated voting mechanism, in this case one that considers agents' ability to vote strategically as an opportunity to reach stable outcomes. 

I talk about this idea here. As with using approval/actions to improve the elicitation of an individual's preferences, you can't avoid making any normative assumptions by using a more complicated aggregation method, but perhaps you end up having to make fewer of them. Very speculatively, if you can combine a robust method of eliciting preferences with few inbuilt assumptions with a similarly robust method of aggregating preferences, you're on your way to a full solution to ambitious value learning.

Comment by sdm on SDM's Shortform · 2020-08-20T23:10:02.446Z · LW · GW

Modelling the Human Trajectory or ‘How I learned to stop worrying and love Hegel’.

Rohin’s opinion: I enjoyed this post; it gave me a visceral sense for what hyperbolic models with noise look like (see the blog post for this, the summary doesn’t capture it). Overall, I think my takeaway is that the picture used in AI risk of explosive growth is in fact plausible, despite how crazy it initially sounds.

One thing this post led me to consider is that when we bring together various fields, the evidence for 'things will go insane in the next century' is stronger than any specific claim about (for example) AI takeoff. What is the other evidence?

We're probably alone in the universe, and anthropic arguments tend to imply we're living at an incredibly unusual time in history. Isn't that what you'd expect to see in the same world where there is a totally plausible mechanism that could carry us a long way up this line, in the form of AGI and eternity in six hours? All the pieces are already there, and they only need to be approximately right for our lifetimes to be far weirder than those of people who were e.g. born in 1896 and lived to 1947 - which was weird enough, but that should be your minimum expectation.

In general, there are three categories of evidence that things are likely to become very weird over the next century, or that we live at the hinge of history:

  1. Specific mechanisms around AGI - possibility of rapid capability gain, and arguments from exploratory engineering

  2. Economic and technological trend-fitting predicting explosive growth in the next century

  3. Anthropic and Fermi arguments suggesting that we live at some extremely unusual time

All of these are evidence for such a claim. 1) is because a superintelligent AGI takeoff is just a specific example for how the hinge occurs. 3) is already directly arguing for that, but how does 2) fit in with 1) and 3)?

There is something a little strange about calling a fast takeoff from AGI and whatever was driving superexponential growth throughout all history the same trend - there is some huge cosmic coincidence that causes there to always be superexponential growth - so as soon as population growth + growth in wealth per capita or whatever was driving it until now runs out in the great stagnation (which is visible as a tiny blip on the RHS of the double-log plot), AGI takes over and pushes us up the same trend line. That's clearly not possible, so there would have to be some factor responsible for both if AGI is what takes us up the rest of that trend line - a factor that was at work in the founding of Jericho but predestined that AGI would be invented and cause explosive growth in the 21st century, rather than the 19th or the 23rd.

For AGI to be the driver of the rest of that growth curve, there has to be a single causal mechanism that keeps us on the same trend and includes AGI as its final step - if we say we are agnostic about what that mechanism is, we can still call 2) evidence for us living at the hinge point, though we have to note that there is a huge blank spot in need of explanation. Is there anything that can fill it to complete the picture?

The mechanism proposed in the article seems like it could plausibly include AGI.

If technology is responsible for the growth rate, then reinvesting production in technology will cause the growth rate to be faster. I'd be curious to see data on what fraction of GWP gets reinvested in improved technology and how that lines up with the other trends.

But even though the drivers seem superficially similar - they are both about technology, the claim is that one very specific technology will generate explosive growth, not that technology in general will - it seems strange that AGI would follow the same growth curve caused by reinvesting more GWP in improving ordinary technology which doesn't improve your own ability to think in the same way that AGI would.

As for precise timings, the great stagnation (last 30ish years) just seems like it would stretch out the timeline a bit, so we shouldn't take the 2050s seriously - as much as the last 70 years work on an exponential trend line there's really no way to make it fit overall as that post makes clear.

Comment by sdm on Open & Welcome Thread - August 2020 · 2020-08-20T11:45:22.501Z · LW · GW

Many alignment approaches require at least some initial success at directly eliciting human preferences to get off the ground - there have been some excellent recent posts about the problems this presents. In part because of arguments like these, there has been far more focus on the question of preference elicitation than on the question of preference aggregation:

The maximally ambitious approach has a natural theoretical appeal, but it also seems quite hard. It requires understanding human preferences in domains where humans are typically very uncertain, and where our answers to simple questions are often inconsistent, like how we should balance our own welfare with the welfare of others, or what kinds of activities we really want to pursue vs. enjoying in the moment...
I have written about this problem, pointing out that it is unclear how you would solve it even with an unlimited amount of computing power. My impression is that most practitioners don’t think of this problem even as a long-term research goal — it’s a qualitatively different project without direct relevance to the kinds of problems they want to solve.

I think that this has a lot of merit, but it has sometimes been interpreted as saying that any work on preference aggregation or idealization, before we have a robust way to elicit preferences, is premature. I don't think this is right - in many 'non-ambitious' settings where we aren't trying to build an AGI sovereign over the whole world (for example, designing a powerful AGI to govern the operations of a hospital) you still need to be able to aggregate preferences sensibly and stably.

I've written a rough shortform post with some thoughts on this problem which doesn't approach the question from a 'final' ambitious value-learning perspective but instead tries to look at aggregation the same way we look at elicitation, with an imperfect, RL-based iterative approach to reaching consensus.

...
The Kidney exchange paper elicited preferences from human subjects (using repeated pairwise comparisons) and then aggregated them using the Bradley-Terry model. You couldn't use such a simple statistical method to aggregate quantitative preferences over continuous action spaces, like the preferences that would be learned from a human via a complex reward model. Also, any time you try to use some specific one-shot voting mechanism you run into various impossibility theorems which seem to force you to give up some desirable property.
One approach that may be more robust against errors in a voting mechanism, and easily scalable to more complex preference profiles is to use RL not just for the preference elicitation, but also for the preference aggregation. The idea is that we embrace the inevitable impossibility results (such as Arrow and GS theorems) and consider agents' ability to vote strategically as an opportunity to reach stable outcomes. 
This paper uses very simple Q-learning agents with a few different policies - epsilon-greedy, greedy and upper confidence bound, in an iterated voting game, and gets behaviour that seems sensible. (Note the similarity and differences with the moral parliament, where a particular one-shot voting rule is justified a priori and then used.)
The fact that this paper exists is a good sign because it's very recent and the methods it uses are very simple - it's pretty much just a proof of concept, as the authors state - so that tells me there's a lot of room for combining more sophisticated RL with better voting methods.

Approaches like these seem especially urgent if AI timelines are shorter than we expect, which has been argued based on results from GPT-3. If this is the case, we might need to be dealing with questions of aggregation relatively soon with methods somewhat like current deep learning, and so won't have time to ensure that we have a perfect solution to elicitation before moving on to aggregation.

Comment by sdm on SDM's Shortform · 2020-08-20T11:27:17.491Z · LW · GW

Improving preference learning approaches

When examining value learning approaches to AI Alignment, we run into two classes of problem - we want to understand how to elicit preferences, which is (even theoretically, with infinite computing power), very difficult, and we want to know how to go about aggregating preferences stably and correctly which is not just difficult but runs into complicated social choice and normative ethical issues.

Many research programs say the second of these questions is less important than the first, especially if we expect continuous takeoff with many chances to course-correct, and a low likelihood of an AI singleton with decisive strategic advantage. For many, building an AI that can reliably extract and pursue the preferences of one person is good enough.

Christiano calls this 'the narrow approach' and sees it as a way to sidestep many of the ethical issues, including those around social choice ethics. Those would be the 'ambitious' approaches.

We want to build machines that helps us do the things we want to do, and to that end they need to be able to understand what we are trying to do and what instrumental values guide our behavior. To the extent that our “preferences” are underdetermined or inconsistent, we are happy if our systems at least do as well as a human, and make the kinds of improvements that humans would reliably consider improvements.
But it’s not clear that anything short of the maximally ambitious approach can solve the problem we ultimately care about.

I think that the ambitious approach is still worth investigating, because it may well eventually need to be solved, and also because it may well need to be addressed in a more limited form even on the narrow approach (one could imagine an AGI with a lot of autonomy having to trade-off the preferences of, say, a hundred different people). But even the 'narrow' approach raises difficult psychological issues about how to distinguish legitimate preferences from bias - questions of elicitation. In other words, the cognitive science issues around elicitation (distinguishing bias from legitimate preference) must be resolved for any kind of preference learning to work, and the social choice and ethical issues around preference aggregation need at least preliminary solutions for any alignment method that aims to apply to more than one person (even if final, provably correct solutions to aggregation are only needed if designing a singleton with decisive strategic advantage).

I believe that I've located two areas that are under- or unexplored, for improving the ability of reward modelling approaches to elicit human preferences and to aggregate human preferences. These are: using multiple information sources from a human (approval and actions) which diverge to help extract unbiased preferences, and using RL proxy agents in iterated voting to reach consensus preference aggregations, rather than some direct statistical method. Neither of these is a complete solution, of course, for reasons discussed e.g. here by Stuart Armstrong, but they could nonetheless help.

Improving preference elicitation: multiple information sources

Eliciting the unbiased preferences of an individual human is extremely difficult, for reasons given here.

The agent's actions can be explained by their beliefs and preferences[1], and by their biases: by this, we mean the way in which the action selector differs from an unboundedly rational expected preference maximiser.
The results of the Occam's razor paper imply that preferences (and beliefs, and biases) cannot be deduced separately from knowing the agent's policy (and hence, a fortiori, from any observations of the agent's behaviour).

...

To get around the impossibility result, we need "normative assumptions": assumptions about the preferences (or beliefs, or biases) of the agent that cannot be deduced fully from observations.
Under the optimistic scenario, we don't need many of these, at least for identifying human preferences. We can label a few examples ("the anchoring bias, as illustrated in this scenario, is a bias"; "people are at least weakly rational"; "humans often don't think about new courses of action they've never seen before", etc...). Call this labelled data[2] D.
The algorithm now constructs categories preferences*, beliefs*, and biases* - these are the generalisations that it has achieved from D

Yes, even on the 'optimistic scenario' we need external information of various kinds to 'debias'. However, this external information can come from a human interacting with the AI, in the form of human approval of trajectories or actions taken or proposed by an AI agent, on the assumption that since our stated and revealed preferences diverge, there will sometimes be differences in what we approve of and what we do that are due solely to differences in bias.

This is still technically external to observing the human's behaviour, but it is essentially a second input channel for information about human preferences and biases. This only works, of course, if humans tend to approve different things to the things that they actually do in a way influenced by bias (otherwise you have the same information as you'd get from actions, which helps with improving accuracy but not debiasing, see here), which is the case at least some of the time.

In other words, the beliefs and preferences are unchanged when the agent acts or approves but the 'approval selector' is different from the 'action selector' sometimes and, based on what does and does not change, you can try to infer what originated from legitimate beliefs and preferences and what originated from variation between the approval and action selector, which must be bias.

So, for example, if we conducted a principle component analysis on π, we would expect that the components would all be mixes of preferences/beliefs/biases.

So a PCA performed on the approval would produce a mix of beliefs, preferences and (different) biases. Underlying preferences are, by specification, equally represented either by human actions or by human approval of actions taken (since no matter what they are your preferences), but many biases don't exhibit this pattern - for example, we discount more over time in our revealed preferences than in our stated preferences. What we approve of typically represents a less (or at least differently) biased response than what we actually do.

There has already been research on combining information on reward models from multiple sources, to infer a better overall reward model but not as far as I know on specifically actions and approval as differently biased sources of information.

CIRL ought to extract our revealed preferences (since it's based on behavioural policy) while a method like reinforcement learning from human preferences should extract our stated preferences - that might be a place to start, at least on validating that there actually are relevant differences caused by differently strong biases in our stated vs revealed preferences, and that the methods actually do end up with different policies.

The goal here would be to have some kind of 'dual channel' preference learner that extracts beliefs and preferences from biased actions and approval by examining what varies. I'm sure you'd still need labelling and explicit information about what counts as a bias, but there might need to be a lot less than with single information sources. How much less (how much extra information you get from such divergences) seems like an empirical question. Finding out how common divergences between stated and revealed preferences that actually influence the learned policies of agents designed to infer human preferences from actions vs approval are would be useful as a first step. Stuart Armstrong:

In the pessimistic scenario, human preferences, biases, and beliefs are twisted together is a far more complicated way, and cannot be separated by a few examples.
In contrast, take examples of racial bias, hindsight bias, illusion of control, or naive realism. These biases all seem to be of quite different from the anchoring bias, and quite different from each other. At the very least, they seem to be of different "type signature".
So, under the pessimistic scenario, some biases are much closer to preferences that generic biases (and generic preferences) are to each other.

What I've suggested should still help at least somewhat in the pessimistic scenario - unless preferences/beliefs vary when you switch between looking at approval vs actions more than biases vary, you can still gain some information on underlying preferences and beliefs by seeing how approval and actions differ.

Of the difficult examples you gave, racial bias at least varies between actions and approval. Implementing different reward modelling algorithms and messing around with them to try and find ways to extract unbiased preferences from multiple information sources might be a useful research agenda.

There has already been research done on using multiple information sources to improve the accuracy of preference learning - Reward-rational implicit choice, but not specifically on using the divergences between different sources of information from the same agent to learn things about the agents unbiased preferences.

Improving preference aggregation: iterated voting games

In part because of arguments like these, there has been less focus on the aggregation side of things than on the direct preference learning side.

Christiano says of methods like CEV, which aim to extrapolate what I ‘really want’ far beyond what my current preferences are; ‘most practitioners don’t think of this problem even as a long-term research goal — it’s a qualitatively different project without direct relevance to the kinds of problems they want to solve’. This is effectively a statement of the Well-definedness consideration when sorting through value definitions - our long-term ‘coherent’ or ‘true’ preferences currently aren’t well understood enough to guide research so we need to restrict ourselves to more direct normativity - extracting the actual preferences of existing humans

However, I think that it is important to get on the right track early - even if we never have cause to build a powerful singleton AI that has to aggregate all the preferences of humanity, there will still probably be smaller-scale situations where the preferences of several people need to be aggregated or traded-off. Shifting a human preference learner from a single to a small group of human preferences could produce erroneous results due to distributional shift, potentially causing alignment failures, so even if we aren't trying for maximally ambitious value learning it might still be worth investigating preference aggregation.

There has been some research done on preference aggregation for AIs learning human values, specifically in the context of Kidney exchanges:

We performed statistical modeling of participants’ pairwise comparisons between patient profiles in order to obtain weights for each profile. We used the Bradley-Terry model, which treats each pairwise comparison as a contest between a pair of players
We have shown one way in which moral judgments can be elicited from human subjects, how those judgments can be statistically modelled, and how the results can be incorporated into the algorithm. We have also shown, through simulations, what the likely effects of deploying such a prioritization system would be, namely that under demanded pairs would be significantly impacted but little would change for others. We do not make any judgment about whether this conclusion speaks in favor of or against such prioritization, but expect the conclusion to be robust to changes in the prioritization such as those that would result from a more thorough process, as described in the previous paragraph.

The Kidney exchange paper elicited preferences from human subjects (using repeated pairwise comparisons) and then aggregated them using the Bradley-Terry model. You couldn't use such a simple statistical method to aggregate quantitative preferences over continuous action spaces, like the preferences that would be learned from a human via a complex reward model. Also, any time you try to use some specific one-shot voting mechanism you run into various impossibility theorems which seem to force you to give up some desirable property.

One approach that may be more robust against errors in a voting mechanism, and easily scalable to more complex preference profiles is to use RL not just for the preference elicitation, but also for the preference aggregation. The idea is that we embrace the inevitable impossibility results (such as Arrow and GS theorems) and consider agents' ability to vote strategically as an opportunity to reach stable outcomes. 

This paper uses very simple Q-learning agents with a few different policies - epsilon-greedy, greedy and upper confidence bound, in an iterated voting game, and gets behaviour that seems sensible. (Note the similarity and differences with the moral parliament, where a particular one-shot voting rule is justified a priori and then used.)

The fact that this paper exists is a good sign because it's very recent and the methods it uses are very simple - it's pretty much just a proof of concept, as the authors state - so that tells me there's a lot of room for combining more sophisticated RL with better voting methods.

Combining elicitation and aggregation

Having elicited preferences from each individual human (using methods like those above to 'debias'), we obtain a proxy agent representing each individual's preferences. Then these agents can be placed into an iterated voting situation until a convergent answer is reached.

That seems like the closest practical approximation to a CEV of a group of people that could be constructed with anything close to current methods - a pipeline from observed behaviour and elicited approval to a final aggregated decision about what to do based on overall preferences. Since its a value learning framework that's extendible over any size group, which is somewhat indirect, you might call it a Coherent Extrapolated Framework (CEF) as I suggested last year.

Comment by sdm on Learning human preferences: optimistic and pessimistic scenarios · 2020-08-19T18:22:23.914Z · LW · GW
To get around the impossibility result, we need "normative assumptions": assumptions about the preferences (or beliefs, or biases) of the agent that cannot be deduced fully from observations.
Under the optimistic scenario, we don't need many of these, at least for identifying human preferences. We can label a few examples ("the anchoring bias, as illustrated in this scenario, is a bias"; "people are at least weakly rational"; "humans often don't think about new courses of action they've never seen before", etc...). Call this labelled data[2] D.
The algorithm now constructs categories preferences*, beliefs*, and biases* - these are the generalisations that it has achieved from D

Yes, even on the 'optimistic scenario' we need external information of various kinds to 'debias'. However, this external information can come from a human interacting with the AI, in the form of human approval of trajectories or actions taken or proposed by an AI agent, on the assumption that since our stated and revealed preferences diverge, there will sometimes be differences in what we approve of and what we do that are due solely to differences in bias.

This is still technically external to observing the human's behaviour, but it is essentially a second input channel for information about human preferences and biases. This only works, of course, if humans tend to approve different things to the things that they actually do in a way influenced by bias (otherwise you have the same information as you'd get from actions, which helps with improving accuracy but not debiasing, see here), which is the case at least some of the time.

In other words, the beliefs and preferences are unchanged when the agent acts or approves but the 'approval selector' is different from the 'action selector' sometimes and, based on what does and does not change, you can try to infer what originated from legitimate beliefs and preferences and what originated from variation between the approval and action selector, which must be bias.

So, for example, if we conducted a principle component analysis on π, we would expect that the components would all be mixes of preferences/beliefs/biases.

So a PCA performed on the approval would produce a mix of beliefs, preferences and (different) biases. Underlying preferences are, by specification, equally represented either by human actions or by human approval of actions taken (since no matter what they are your preferences), but many biases don't exhibit this pattern - for example, we discount more over time in our revealed preferences than in our stated preferences. What we approve of typically represents a less (or at least differently) biased response than what we actually do.

There has already been research on combining information on reward models from multiple sources, to infer a better overall reward model but not as far as I know on specifically actions and approval as differently biased sources of information.

CIRL ought to extract our revealed preferences (since it's based on behavioural policy) while a method like reinforcement learning from human preferences should extract our stated preferences - that might be a place to start, at least on validating that there actually are relevant differences caused by differently strong biases in our stated vs revealed preferences, and that the methods actually do end up with different policies.

The goal here would be to have some kind of 'dual channel' preference learner that extracts beliefs and preferences from biased actions and approval by examining what varies. I'm sure you'd still need labelling and explicit information about what counts as a bias, but there might need to be a lot less than with single information sources. How much less (how much extra information you get from such divergences) seems like an empirical question. Finding out how common divergences between stated and revealed preferences that actually influence the learned policies of agents designed to infer human preferences from actions vs approval are would be useful as a first step.

In the pessimistic scenario, human preferences, biases, and beliefs are twisted together is a far more complicated way, and cannot be separated by a few examples.
In contrast, take examples of racial bias, hindsight bias, illusion of control, or naive realism. These biases all seem to be of quite different from the anchoring bias, and quite different from each other. At the very least, they seem to be of different "type signature".
So, under the pessimistic scenario, some biases are much closer to preferences that generic biases (and generic preferences) are to each other.

What I've suggested should still help at least somewhat in the pessimistic scenario - unless preferences/beliefs vary when you switch between looking at approval vs actions more than biases vary, you can still gain some information on underlying preferences and beliefs by seeing how approval and actions differ.

Of the difficult examples you gave, racial bias at least varies between actions and approval. Implementing different reward modelling algorithms and messing around with them to try and find ways to extract unbiased preferences from multiple information sources might be a useful research agenda.

There has already been research done on using multiple information sources to improve the accuracy of preference learning - Reward-rational implicit choice, but not specifically on using the divergences between different sources of information from the same agent to learn things about the agents unbiased preferences.

Comment by sdm on Open & Welcome Thread - August 2020 · 2020-08-15T16:38:27.193Z · LW · GW

Covid19Projections has been one of the most successful coronavirus models in large part because it is as 'model-free' and simple as possible, using ML to backtrack parameters for a simple SEIR model from death data only. This has proved useful because case numbers are skewed by varying numbers of tests, so deaths are more consistently reliable as a metric. You can see the code here.

However, in countries doing a lot of testing, with a reasonable number of cases but with very few deaths, like most of Europe, the model is not that informative, and essentially predicts near 0 deaths out to the limit of its measure. This is expected - the model is optimised for the US.

Estimating SEIR parameters based on deaths works well when you have a lot of deaths to count, if you don't then you need another method. Estimating purely based on cases has its own pitfalls - see this from epidemic forecasting, which mistook an increase in testing in the UK mid-july for a sharp jump in cases and wrongly inferred brief jump in R_t. As far as I understand their paper, the estimate of R_t from case data adjusts for delays in infection to onset and for other things, but not for the positivity rate or how good overall testing is.

This isn't surprising - there is no simple model that combines test positivity rate and the number of cases and estimates the actual current number of infections. But perhaps you could use a Covid19pro like method to learn such a mapping.

Very oversimplified, Covid19pro works like this:

Our COVID-19 prediction model adds the power of artificial intelligence on top of a classic infectious disease model. We developed a simulator based on the SEIR model (Wikipedia) to simulate the COVID-19 epidemic in each region. The parameters/inputs of this simulator are then learned using machine learning techniques that attempts to minimize the error between the projected outputs and the actual results. We utilize daily deaths data reported by each region to forecast future reported deaths. After some additional validation techniques (to minimize a phenomenon called overfitting), we use the learned parameters to simulate the future and make projections.

And the functions f and g, estimate the SEIR (susceptible, exposed, infectious, recovered) parameters from current deaths up to some time t_0, and the future deaths based on those parameters respectively. These functions are then both optimised to minimise error when the actual number of deaths at t_1 is fed into the model.

This oversimplification is deliberate:

Deaths data only: Our model only uses daily deaths data as reported by Johns Hopkins University. Unlike other models, we do not use additional data sources such as cases, testing, mobility, temperature, age distribution, air traffic, etc. While supplementary data sources may be helpful, they can also introduce additional noise and complexity which can notably skew results.

What I suggest is a slight increase in complexity, where we use a similar model except we feed it paired test positivity rate and case data instead of death data. The positivity rate /tests per case serves as a 'quality estimate' which serves to tell you how good the test data is. That's how tests per case is treated by our world in data. We all know intuitively that if positivity rate is going down but cases are going up, the increase might not be real, but if positivity rate is going up and cases are going up the increase definitely is real.

What I'm suggesting is that we combine do something like this:

Now, you need to have reliable data on the number of people tested each week, but most of Europe has that. If you can learn a model that gives you a more accurate estimate of the SEIR parameters from combined cases and tests/case data, then it should be better at predicting future infections. It won't necessarily predict future cases, since the number of future cases is also going to depend on the number of tests conducted, which is subject to all sorts of random fluctuations that we don't care about when modelling disease transmission, so instead you could use the same loss function as the original covid19pro - minimizing the difference between projected and actual deaths.

Hopefully the intuition that you can learn more from the pair (tests/case, number of cases) than number of cases or number of deaths alone should be borne out, and a c19pro-like model could be trained to make high quality predictions in places with few deaths using such paired data. You would still need some deaths for the loss function and fitting the model.

Comment by sdm on Developmental Stages of GPTs · 2020-08-15T15:43:05.761Z · LW · GW

Superintelligence and other classic presentations of AI risk definitely offer additional arguments/considerations. The likelihood of extremely discontinuous/localized progress is, of course, the most prominent one.

Perhaps what is going on here is that the arguments as stated in brief summaries like 'orthogonality thesis + instrumental convergence' just aren't what the arguments actually were, and that there were from the start all sorts of empirical or more specific claims made around these general arguments.

This reminds me of Lakatos' theory of research programs - where the core assumptions, usually logical or a priori in nature, are used to 'spin off' secondary hypotheses that are more empirical or easily falsifiable.

Lakatos' model fits AI safety rather well - OT and IC are some of these non-emperical 'hard core' assumptions that are foundational to the research program and then in ~2010 there were some secondary assumptions, discontinuous progress, AI maximises a simple utility function etc. but in ~2020 we have some different secondary assumptions: mesa-optimisers, you get what you measure, direct evidence of current misalignment

Comment by sdm on Do we have updated data about the risk of ~ permanent chronic fatigue from COVID-19? · 2020-08-15T15:24:04.089Z · LW · GW

Fatigue that lasts 2-3 weeks after the worst symptoms are over is common with essentially all bad viral infections - post-flu fatigue is common for example (can't find any good statistics on how common). So, I don't know if 1/3 reporting fatigue 2 to 3 weeks after tells us anything useful about how common post-covid fatigue lasting months afterwards is

Comment by sdm on The Case for Education · 2020-08-15T14:30:45.657Z · LW · GW

So let me now make the case for education.

Education is key to civilizational sanity, sensemaking, and survival. Education is key to The Secret of our Success.

Education is the scaffolding on which our society, culture and civilization are built and maintained.

I think that, rather like the rationalist criticism of healthcare, a lot of this is US-centric and, while it still applies to Europe and the UK, it does so less strongly. There's still credentialism, signalling, an element of zero-sum competition but many of the most egregious examples of the university system promoting costly signalling ahead of actual training and growth of knowledge (the lack of subject focus in college degrees, medical school being separate to university, colossal cost disease, 'holistic' admissions) either don't exist in Europe or aren't as egregious.

I also think that you underestimate the number of EA and rationalist types who are working within the university system - most technical AI safety research not being done by OpenAI/Deepmind is in some way affiliated with a university, for example

Comment by sdm on Covid 8/13: Same As It Ever Was · 2020-08-14T10:44:12.275Z · LW · GW
What makes him unique is that Bill Gates is actually trying.
As far as I can tell, no one else with billions of dollars is actually trying to help as best they can. Those same effective altruists are full of detailed thoughts, but aside from shamefully deplatforming Robin Hanson it’s been a long time since I’ve heard about them making a serious attempt to do anything.

I agree with you about the Hanson thing, but the EA movement did its best to shift as much funding as was practical towards coronavirus related causes. This page covers Givewell's efforts, this covers the career and contribution advice of 80k hours. I know more than a few EA types who dropped whatever they were doing in March to try and focus on coronavirus modelling - for example, FHI's Epidemic Forecasting project.

Bill Gates didn’t. He’s out there doing the best he knows how to do.
Thus, we should quote Theodore Roosevelt, and first and foremost applaud him and learn from him.
Also, read the whole thing. Mostly the information speaks for itself.

I found the entire podcast to be quite astounding, especially the part where Gates explained how he had to sit down and patiently listen to Trump saying vaccines didn't work. When I consider how much of America apparently hates him despite all this, it couldn't help but remind me of a certain quote.

I still don't understand it. They should have known that their lives depended on that man's success. And yet it was as if they tried to do everything they could to make his life unpleasant. To throw every possible obstacle into his way...

As to your discussion about Hospitalization rates - it's interesting to note how our picture of the overall hospitalization rate has evolved over time, from estimating near 20% to as low as 2%. I wrote a long comment with an estimation of what it might be for the UK, with this conclusion -

We know from the ONS that the total number of patients ever admitted to hospital with coronavirus on July 22nd was 131,412. That number is probably pretty close to accurate - even during the worst of the epidemic the UK was testing more or less every hospital patient with coronavirus symptoms. The estimated number of people ever infected on July 22nd by c19pro was 5751036
So, 131412/5751036 = 2.3% hospitalization rate
Comment by sdm on Alignment By Default · 2020-08-13T16:53:35.798Z · LW · GW

‘You get what you measure’ (outer alignment failure) and Mesa optimisers (inner failure) are both potential gap fillers that explain why specifically the alignment/capability divergence initially arises. Whether it’s one or the other, I think the overall point is still that there is this gap in the classic arguments that allows for a (possibly quite high) chance of ‘alignment by default’, for the reasons you give, but there are at least 2 plausible mechanisms that fill this gap. And then I suppose my broader point would be that we should present:

Classic Arguments —> objections to them (capability and alignment often go together, could get alignment by default) —> specific causal mechanisms for misalignment

Comment by sdm on Alignment By Default · 2020-08-13T12:45:22.540Z · LW · GW

I think what you've identified here is a weakness in the high-level, classic arguments for AI risk -

Overall, I’d give maybe a 10-20% chance of alignment by this path, assuming that the unsupervised system does end up with a simple embedding of human values. The main failure mode I’d expect, assuming we get the chance to iterate, is deception - not necessarily “intentional” deception, just the system being optimized to look like it’s working the way we want rather than actually working the way we want. It’s the proxy problem again, but this time at the level of humans-trying-things-and-seeing-if-they-work, rather than explicit training objectives.

This failure mode of deceptive alignment seems like it would result most easily from Mesa-optimisation or an inner alignment failure. Inner Alignment / Misalignment is possibly the key specific mechanism which fills a weakness in the 'classic arguments' for AI safety - the Orthogonality Thesis, Instrumental Convergence and Fast Progress together implying small separations between AI alignment and AI capability can lead to catastrophic outcomes. The question of why there would be such a damaging, hard-to-detect divergence between goals and alignment needs an answer to have a solid, specific reason to expect dangerous misalignment, and Inner Misalignment is just such a reason.

I think that it should be presented in initial introductions to AI risk alongside those classic arguments, as the specific, technical reason why the specific techniques we use are likely to produce such goal/capability divergence - rather than the general a priori reasons given by the classic arguments.

Comment by sdm on Buck's Shortform · 2020-08-06T14:26:02.228Z · LW · GW

I wrote a whole post on modelling specific continuous or discontinuous scenarios- in the course of trying to make a very simple differential equation model of continuous takeoff, by modifying the models given by Bostrom/Yudkowsky for fast takeoff, the result that fast takeoff means later timelines naturally jumps out.

Varying d between 0 (no RSI) and infinity (a discontinuity) while holding everything else constant looks like this: Continuous Progress If we compare the trajectories, we see two effects - the more continuous the progress is (lower d), the earlier we see growth accelerating above the exponential trend-line (except for slow progress, where growth is always just exponential) and the smoother the transition to the new growth mode is. For d=0.5, AGI was reached at t=1.5 but for discontinuous progress this was not until after t=2. As Paul Christiano says, slow takeoff seems to mean that AI has a larger impact on the world, sooner.

But that model relies on pre-setting a fixed 'threshold for AGI, given by the parameter AGI, in advance. This, along with the starting intelligence of the system, fixes how far away AGI is.

For values between 0 and infinity we have varying steepnesses of continuous progress. IAGI is the Intelligence level we identify with AGI. In the discontinuous case, it is where the jump occurs. In the continuous case, it is the centre of the logistic curve. here IAGI=4

You could (I might get round to doing this), model the effect you're talking about by allowing IAGI to vary with the level of discontinuity. So every model would start with the same initial intelligence I0, but the IAGI would be correlated with the level of discontinuity, with larger discontinuity implying IAGI is smaller. That way, you would reproduce the epistemic difference of expecting a stronger discontinuity - that the current intelligence of AI systems is implied to be closer to what we'd expect to need for explosive growth on discontinuous takeoff scenarios than on continuous scenarios.

We know the current level of capability and the current rate of progress, but we don't know I_AGI, and holding all else constant slow takeoff implies I_AGI is a significantly higher number (again, I_AGI is relative to the starting intelligence of the system)

This is because my model was trying to model different physical situations, different ways AGI could be, not different epistemic situations, so I was thinking in terms of I_AGI being some fixed, objective value that we just don't happen to know.

I'm uncertain if there's a rigorous way of quantifying how much this epistemic update does against the physical fact that continuous takeoff implies an earlier acceleration above exponential. If you're right, it overall completely cancels this effect out and makes timelines on discontinuous takeoff earlier overall - I think you're right about this. It would be easy enough to write something to evenly cancel it out, to make all takeoffs in the different scenarios appear at the same time, but that's not what you have in mind.

Comment by sdm on Coronavirus as a test-run for X-risks · 2020-08-05T13:47:09.741Z · LW · GW

So, two months have gone by. My main conclusions look mostly unchanged, except that I wasn't expecting such a monotonically stable control system effect in the US. Vaccine news looks better than I expected, superforecasters are optimistic. The major issue in countries with moderate to good state capacity is preventing a winter second wave and managing small infection spikes. Rob Wiblin seems to buy in to the MNM effect.

Whatever happened to the Hospitalization Rate?

Many of these facts (in particular the reason that 100 million plus dead is effectively ruled out) have multiple explanations. For one, the earliest data on coronavirus implied the hospitalization rate was 10-20% for all age groups, and we now know it is substantially lower (that tweet by an author of the Imperial College paper, which estimated a hospitalization rate of 4.4%). This means that if hospitals were entirely unable to cope with the number of patients, the IFR would be in the range of 2%, not 20% initially implied.

Back in a previous Age of The Earth, also known as early March 2020, the most important thing in the world was to figure out the coronavirus hospitalization rate, and we overestimated it. See e.g.

Suppose 50% of the UK (33 million people) get the virus of which 5% (~ 1.8 million people) will need serious hospitalization [conservative estimate].

It's mostly of academic interest now, since (at least in Europe) genuine exponential spread is looking more and more like the exception rather than the rule, but considering how much time we spent discussing this issue I'd like to know the final answer for completeness’ sake. It looks like even 'conservative' estimates of the hospitalization rates were too high by a factor of at least 2, just as claimed by the author of that imperial paper.

Here's a crude estimate: the latest UK serology survey says 6.2% of people were infected by July 26th. Another says 7.1% were infected by July 30. The level of infection is so low in the UK right now that you'll only get movement by a few tenths of a percentage point over the couple of weeks between then and now.

The false negative rate is unclear but I've heard claims as high as a third, so the real number may be as high as 9.3% based on the overall infection survey. Covid19pro estimated that on July 26th 8.6% (13.3-5.1%) had been infected. That 8.6% number seems to correspond to a reasonable false negative rate on the antibody tests (28% if you believe the first study, ~17% if you believe the second survey).

In other words, the median estimates from covid19pro look reasonably consistent with the antibody tests, implying a false negative rate of about 15-30%, so I'm just going to assume they're roughly accurate.

We know from the ONS that the total number of patients ever admitted to hospital with coronavirus on July 22nd was 131,412. That number is probably pretty close to accurate - even during the worst of the epidemic the UK was testing more or less every hospital patient with coronavirus symptoms. The estimated number of people ever infected on July 22nd by c19pro was 5751036

So, 131412/5751036 = 2.3% hospitalization rate

Comment by sdm on A sketch of 'Simulacra Levels and their Interactions' · 2020-08-05T10:09:47.712Z · LW · GW

Harry Frankfurt's On Bullshit seems relevant here. I think its worth trying to incorporate Frankfurt's definition as well, as it is quite widely known, see e.g. this video - If you were to do so, I think you would say that on Frankfurt's definition, Level 1 tells the truth, Level 2 lies, Level 3 bullshits about physical facts but will lie or tell the truth about things in the social realm (e.g. others motives, your own affiliation), and Level 4 always bullshits.

Taken this way, Frankfurt's model is a higher-level model that distinguishes the ones who care about reality from the ones that don't - roughly speaking, bullshit characterises levels 3 and 4 as the ones unconcerned with reality.

If you did it on the diagram, the union of 3 and 4 would be bullshitters, but shading more strongly towards the 4 end.

Comment by sdm on Unifying the Simulacra Definitions · 2020-08-05T10:07:03.137Z · LW · GW

If your aim is to unify different ways of understanding dishonesty, social manipulation and 'simulacra', then Harry Frankfurt's On Bullshit needs to be considered.

What bullshit essentially misrepresents is neither the state of affairs to which it refers nor the beliefs of the speaker concerningthat state of affairs. Those are what lies misrepresent, by virtue ofbeing false. Since bullshit need not be false, it differs from lies in its misrepresentational intent. The bullshitter may not deceive us, or even intend to do so, either about the facts or about what he takes the facts to be. What he does necessarily attempt to deceive us about is his enterprise. His only indispensably distinctive characteristic is that in a certain way he misrepresents what he is up to

I think its worth trying to incorporate Frankfurt's definition as well, as it is quite widely known, see e.g. this video - If you were to do so, I think you would say that on Frankfurt's definition, Level 1 tells the truth, Level 2 lies, Level 3 bullshits about physical facts but will lie or tell the truth about things in the social realm (e.g. others motives, your own affiliation), and Level 4 always bullshits.

It does seem that bullshitting involves a kind of bluff. It is closer to bluffing, surely than to telling a lie. But what is implied concerning its nature by the fact that it is more like the former than it is like the latter? Just what is the relevant difference here between a bluff and a lie? Lying and bluffing are both modes of misrepresentation or deception. Now the concept most central to the distinctive nature of a lie is that of falsity: the liar is essentially someone who deliberately promulgates a falsehood. Bluffing too is typically devoted to conveying something false. Unlike plain lying, however, it is more especially a matter not of falsity but of fakery. This is what accounts for its nearness to bullshit. For the essence of bullshit is not that it is false but that it is phony. In order to appreciate this distinction, one must recognize that a fake or a phony need not be in any respect (apart from authenticity itself) inferior to the real thing. What is not genuine need not also be defective in some other way. It may be, after all, an exact copy. What is wrong with a counterfeit is not what it is like, but how it was made. This points to a similar and fundamental aspect of the essential nature of bullshit: although it is produced without concern with the truth, it need not be false.

Taken this way, Frankfurt's model is a higher-level model that distinguishes the ones who care about reality from the ones that don't - roughly speaking, bullshit characterises levels 3 and 4 as the ones unconcerned with reality.

Comment by sdm on SDM's Shortform · 2020-08-04T18:48:02.393Z · LW · GW
So, the mountain disanalogy: sometimes there are things we have opinions about, and yet there is no clean separation between us and the thing. We don't perceive it in a way that we can agree is trusted or privileged. We receive vague, sparse data about it, and the subject is plagued by disagreement, self-doubt, and claims that other people are doing it all wrong.
This isn't to say that we should give up entirely, but it means that we might have to shift our expectations of what sort of explanation or justification we are "entitled" to.

So this depends on two things - first, how likely (in advance of assessing the 'evidence') something like normative realism is, and then how good that evidence is (how coherent it is). If we have really good reasons in advance to think there's 'no separation between us and the thing' then no matter how coherent the 'thing' is we have to conclude that while we might all be able to agree on what it is, it isn't mind independent.

So, is it coherent, and is it mind-independent? How coherent it needs to be for us to be confident we can know it, depends on how confident we are that its mind-independent, and vice versa.

The argument for coherence comes in the form of convergence (not among people, to be clear, but among normative frameworks), but as you say that doesn't establish its mind independent (it might give you some strong hint, though, if its really strongly consistent and coherent), and the argument that normativity is mind-independent comes from the normativity argument. These three posts deal with the difference between those two arguments and how strong they are, and how they interact:

Normative Anti-realism is self-defeating

Normativity and recursive justification

Prescriptive Anti-realism

Comment by sdm on Open & Welcome Thread - July 2020 · 2020-08-04T18:30:36.198Z · LW · GW

The comment has since been expanded into the (unofficial) Moral Realism sequence. I cover a bunch of issues, including the (often not recognised) distinction between prescriptive and non-prescriptive anti realism - which is an issue that is relevant to some important factual questions (as it overlaps with the 'realism about rationality' issue driving some debates in AI safety), whether we need normative facts and what difference convergence of moral views may or may not make.

Normative Realism by Degrees

Normative Anti-realism is self-defeating

Normativity and recursive justification

Prescriptive Anti-realism

The goal here was to explain what moral realists like about moral realism - for those who are perplexed about why it would be worth wanting or how anyone could find it plausible, and explain what things depend on it being right or wrong, and how you may or may not retain some of the features of realism (like universalizability) if different anti-realist views are true.

Comment by sdm on SDM's Shortform · 2020-08-04T15:50:48.485Z · LW · GW

Prescriptive Anti-realism

An extremely unscientific and incomplete list of people who fall into the various categories I gave in the previous post:

1. Accept Convergence and Reject Normativity: Eliezer Yudkowsky, Sam Harris (Interpretation 1), Peter Singer in The Expanding Circle, RM Hare and similar philosophers, HJPEV

2. Accept Convergence and Accept Normativity: Derek Parfit, Sam Harris (Interpretation 2), Peter Singer today, the majority of moral philosophers, Dumbledore

3. Reject Convergence and Reject Normativity: Robin Hanson, Richard Ngo (?), Lucas Gloor (?) most Error Theorists, Quirrell

4. Reject Convergence and Accept Normativity: A few moral philosophers, maybe Ayn Rand and objectivists?

The difference in practical, normative terms between 2), 4) and 3) is clear enough - 2 is a moral realist in the classic sense, 4 is a sceptic about morality but agrees that irreducible normativity exists, and 3 is a classic 'antirealist' who sees morality as of a piece with our other wants. What is less clear is the difference between 1) and 3). In my caricature above, I said Quirrell and Harry Potter from HPMOR were non-prescriptive and prescriptive anti-realists, respectively, while Dumbledore is a realist. Here is a dialogue between them that illustrates the difference.

Harry floundered for words and then decided to simply go with the obvious. "First of all, just because I want to hurt someone doesn't mean it's right -"
"What makes something right, if not your wanting it?"
"Ah," Harry said, "preference utilitarianism."
"Pardon me?" said Professor Quirrell.
"It's the ethical theory that the good is what satisfies the preferences of the most people -"
"No," Professor Quirrell said. His fingers rubbed the bridge of his nose. "I don't think that's quite what I was trying to say. Mr. Potter, in the end people all do what they want to do. Sometimes people give names like 'right' to things they want to do, but how could we possibly act on anything but our own desires?"

The relevant issue here is that Harry draws a distinction between moral and non-moral reasons even though he doesn't believe in irreducible normativity. In particular, he's committed to a normative ethical theory, preference utilitarianism, as a fundamental part of how he values things.

Here is another illustration of the difference. Lucas Gloor (3) explains the case for suffering-focussed ethics, based on the claim that our moral intuitions assign diminishing returns to happiness vs suffering.

While there are some people who argue for accepting the repugnant conclusion (Tännsjö, 2004), most people would probably prefer the smaller but happier civilization – at least under some circumstances. One explanation for this preference might lie in intuition one discussed above, “Making people happy rather than making happy people.” However, this is unlikely to be what is going on for everyone who prefers the smaller civilization: If there was a way to double the size of the smaller population while keeping the quality of life perfect, many people would likely consider this option both positive and important. This suggests that some people do care (intrinsically) about adding more lives and/or happiness to the world. But considering that they would not go for the larger civilization in the Repugnant Conclusion thought experiment above, it also seems that they implicitly place diminishing returns on additional happiness, i.e. that the bigger you go, the more making an overall happy population larger is no longer (that) important.
By contrast, people are much less likely to place diminishing returns on reducing suffering – at least17 insofar as the disvalue of extreme suffering, or the suffering in lives that on the whole do not seem worth living, is concerned. Most people would say that no matter the size of a (finite) population of suffering beings, adding more suffering beings would always remain equally bad.
It should be noted that incorporating diminishing returns to things of positive value into a normative theory is difficult to do in ways that do not seem unsatisfyingly arbitrary. However, perhaps the need to fit all one’s moral intuitions into an overarching theory based solely on intuitively appealing axioms simply cannot be fulfilled.

And what are those difficulties mentioned? The most obvious is the absurd conclusion - that scaling up a population can turn it from axiologically good to bad:

Hence, given the reasonable assumption that the negative value of adding extra lives with negative welfare does not decrease relatively to population size, a proportional expansion in the population size can turn a good population into a bad one—a version of the so-called “Absurd Conclusion” (Parfit 1984). A population of one million people enjoying very high positive welfare and one person with negative welfare seems intuitively to be a good population. However, since there is a limit to the positive value of positive welfare but no limit to the negative value of negative welfare, proportional expansions (two million lives with positive welfare and two lives with negative welfare, three million lives with positive welfare and three lives with negative welfare, and so forth) will in the end yield a bad population.

Here, then, is the difference - If you believe, as a matter of fact, that our values cohere and place fundamental importance on coherence, whether because you think that is the way to get at the moral truth (2) or because you judge that human values do cohere to a large degree for whatever other reason and place fundamental value on coherence (1), you will not be satisfied with leaving your moral theory inconsistent. If, on the other hand, you see morality as continuous with your other life plans and goals (3), then there is no pressure to be consistent. So to (3), focussing on suffering-reduction and denying the absurd conclusion is fine, but this would not satisfy (1).

I think that, on closer inspection, (3) is unstable - unless you are Quirrell and explicitly deny any role for ethics in decision-making, we want to make some universal moral claims. The case for suffering-focussed ethics argues that the only coherent way to make sense of many of our moral intuitions is to conclude a fundamental asymmetry between suffering and happiness, but then explicitly throws up a stop sign when we take that argument slightly further - to the absurd conclusion, because 'the need to fit all one’s moral intuitions into an overarching theory based solely on intuitively appealing axioms simply cannot be fulfilled'. Why begin the project in the first place, unless you place strong terminal value on coherence (1)/(2) - in which case you cannot arbitrarily halt it.

Comment by sdm on Covid 7/30: Whack a Mole · 2020-08-04T09:26:07.716Z · LW · GW

It’s clearly the case that the public line about 70% herd immunity is still out there, but I think my broader point is served by that report. They have the obligatory ‘herd immunity is reached at 70% and there may be no immunity conferred’ caveat but then the actual model implies that in a worst case scenario 30% of the UK gets infected. You might speculate that they consulted the modellers for the model but not for the rest of it.