Or, alternatively, did Oxford really find a pharmaceutical company so incompetent that they did this by mistake, on top of giving an entire trial segment the wrong dose of vaccine the first time around? These are some rather epic screwups.
My experience working for a large company makes me not particularly surprised by this and I would give a decent amount of probability to this being an accident. I don't know enough about the specific procedures to be hugely confident but it does seem most likely to me.
If we're fairly confident that the wrong dose thing was an accident - I can't think of any reason to do this deliberately and then try to cover it up - then AstraZeneca obviously have the potential to make big mistakes.
One scenario would be that the person requesting / approving the press release is not the same as the person running the project but rather their boss or their bosses boss or even in another department. The press release approver is less involved in the minutiae and has remembered the 79% figure, maybe even goes so far as to check their e-mails that this is the correct figure (or check with someone else who checks their e-mails). Probably none of these people were in the meeting with the safety board.
I have had this experience myself on many occasions where my superiors have given information to customers that is outdated just from them not being as up to date or forgetting the latest results. Obviously I'd like to think something like this would have more care taken about it but the dosing debacle is suggestive that checking things isn't AstraZeneca's strong suit.
That combined with the 0% chance of this not being noticed suggests to me that this wasn't on purpose.
I wondered whether a decent amount of the cost increase was in changing from a hatchback to a sedan but I see that this is only $1,000 to go from the Mirage hatchback to sedan. And the Mirage sedan is the same size as a 90's Ford Escort sedan/station wagon so size doesn't explain it either.
Yeah, I didn't actually answer q18 either (possibly knite maybe used my list as a basis?) for exactly that reason. Scott just put me in as the same as him for that question for the purposes of making an apples-to-apples comparison which seemed fine - no idea what I would have put if I had answered!
Looking at the study it doesn’t look like the participants in the trial were randomised - rather if you wanted to use Taffix you could.
If I’m right I’m not sure what to make of it - you could have selection bias either way. More conscientious/concerned people took it or people with jobs where they had higher exposure levels took it. I would guess the former effect would be larger but not sure.
Yes, I agree Russia was unlikely to be above US for population reasons, I mentioned them more as an example of how bad under-reporting can be - I can't think of a way other than Covid to get 147k unaccounted for excess deaths but I could be missing something. I had concerns about this in all 3 of China, India and Brazil (although I guess there's the chance that we wouldn't get (accurate) excess deaths numbers anyway). 85% for 6 seems right but only dropping 5% for 17 seems low.
A commenter on Scott's post has made a case for India deaths being higher than US (enough to convince Scott it seems).
Its possible / likely that I'm still missing how difficult it is to win a parlay but:
Given Covid is seen as seasonal by the end of the year, there was very likely some wave in Autumn - the main question is whether it meets the conditions set out in 17
At the time of prediction it seemed almost certain that we would get below the thresholds with the next month or two
I expected (but wasn't certain) that a second wave would take us back above one of those thresholds.
There remains the question of having a wave in the middle (Autumn wave is therefore not second wave). This was somewhere that my model was expecting a profile in the US more like what happened in the UK/Europe where cases/deaths were at a very low level for most of the Summer. This is a common thread in a few of my other predictions about US numbers - I generally underpredicted slightly but noticeably and this was a significant cause for that. So yeah, definitely an oversight from me in that regards.
I was going to write up my thoughts on this but it would be easier to just comment here.
I agree with your assessments for almost all of these. I was most impressed by your understanding of the politics in Q9 & 11 (China and Hydroxychloroquine) and the predicting the lack of consensus for Q14 & 15.
A couple where I have a question:
1. On 6/7 (US highest toll official & unofficial) I had a bit more probability on Brazil (similar to India, more than China) – given large population (2/3rds US) and approach of the government.
Regarding official vs unofficial, you only mention deliberate lying but I had more expectation of insufficient / bad testing hiding true amounts than lying. According to WSJ Russia’s excess deaths are 4.8x higher than their official deaths (compared to 1.7x for US). This isn’t enough to overtake the US but I think this gives an idea of the scale of the potential problem. Mexico’s excess deaths are higher than Brazil’s despite having 35% fewer official cases. (India isn’t included in those numbers - excess deaths stats aren’t available I think).
Does that change your mind as to what a good prediction would have been?
2. On q17 (second wave) your prediction for p(17|16) is ~29%. Given that we are in a world where there is a general consensus that summer made things less bad, 29% seems low for a second wave even given the difficult operationalisation? My corresponding number was 50% which still seems better to me (although I messed up q16 so we actually predicted the same for 17 itself). In terms of which way it resolves, I think just numbers of deaths resolves this as clearly true (assuming by Autumn we mean 22 Sep – 21 Dec), both in terms of official result and intent:
Was there a second wave in Autumn? Yes, in late Autumn running into early Winter.
The problem is the notice given which results in the low correlation you mention. (by audit I don't really mean financial audits as I don't have experience of those - I'm more thinking of quality audits)
The question of how much more infectious B.1.1.7 is is pretty useless without also referencing a generation time estimate.
Generally true, but in using contact tracing data the English analysis is answering the "how much more infectious" question directly rather than relying on inferring from relative growth rates and estimated generation times.
The 37% error does revise my estimate a bit for how confident I should be that it is <50% (although even correcting that it is probably still under 50% according to Zvi) but I still expect it to end up that side of the equation. If I was answering that survey now I'd be at 20% or so.
If, however, you previously had accepted that the English strain was more infectious, and the question was how much more infectious, then news of the answer could be good or bad. In this case, it’s good.
This is an estimated 37% increase in infectiousness.
Trying to think when logarithmic thinking makes sense and why humans might often think like that:
If I am in control of all (or almost all) of my risks then a pandemic where I am only taking a 0 person risk is very different from a pandemic where I am taking a 99 person risk. So moving from 0 to 1 in the presumably-very-deadly-pandemic where I am being super-cautious is a very bad thing. Moving from 99 to 100 in the probably-not-too-bad-pandemic where I'm already meeting up with 99 people is probably not too bad.
So thinking logarithmically makes sense if my base level of risk is strongly correlated to the deadliness of the pandemic. The more sensible route is to skip the step of looking at what risk I'm taking to give me evidence of how bad things are and just look directly at how bad things are.
In the 2 examples you give there are external reasons for additional base risk and these are not (strongly) correlated with the deadliness of the pandemic.
1. Salaries can't add much, especially if you're looking at mass producing. If you're creating 500 vaccines then maybe it takes a couple of hours? Say $20/hour (looking at local job listings for this kind of role) we get 8c/dose on salary. As you scale this is only going to go down.
2. It seems like vaccine trials can be done for a few hundred million although there is a big variation and I'm not completely sure whether the numbers given there include some manufacturing build up. If a large pharma company is going to be making lots of vaccine it seems like they should be able to achieve that for less than $1/dose.
3a. Taxes may add a decent few percent but can't be a main driver of cost
3b. Shipping costs for refrigerated goods are maybe 5c per 1000 miles per kg. That data is from a while back (1988!) and costs might be a bit higher for colder temperatures but I can't see this being a large fraction of the cost.
4a. For liability I note that at least AstraZeneca have struck deals in most countries to be exempt from such liabilities. It seems that in the US all COVID vaccines will benefit from this.
4b. Some companies (at least AstraZeneca and Johnson & Johnson) have said that they will be selling their COVID vaccines at cost. Even lacking this, I wouldn't expect corporate profits to be huge, even just from a PR point of view.
5. Risk of failed vaccine trials. If you only expect to have a 1 in 3 chance of successful stage 3 trial then the $1/dose from 2 becomes $3/dose to expect to break even. I'm not sure whether this risk is covered by governments - I think it was to some extent but am not confident.
Given Dentin's comment that the material cost if something like 10c/dose (which makes sense given how little it cost to double John's peptides order) then I think most of the cost looks like it is in the trials and risk of failure thereof but this isn't enough to explain why companies aren't doing this. Its probably too late now anyway as vaccines already approved should have the pandemic under control before any new trials would be complete.
One thing I've been thinking about regarding animal noises is the actual animal noise and the not-quite-phonetic spelling of the noise. My slightly late talking youngest son makes realistic animal noises rather than saying "moo" (which I think most children would say). Even now when I think of what noise a cow makes I initially think "moo", rather than of the actual noise itself.
I'm not sure that actually means anything - just an observation!
Say my life expectancy from now is 50 years and I work at an hourly salary of $30 (~$60k yearly salary) then I implicitly value the remaining 310,250 hours of my waking life at something like $9.3m total. This breaks down if offered larger probabilities of death and larger amounts of money (e.g. opportunity cost) but $10m seems like a sensible place to start for a Fermi calculation.
In this case we don't even have to worry about larger probabilities of death - the calculation here is essentially an expected gain of 1.2 days of life for $1000 which comes to about $50 per hour of waking life. Instead of making a vaccine only for myself I would be better just to take half a week unpaid leave and gain the same amount of time for a cost of only $600.
You would need to do this in every country that you trade with and every country that they trade with (if we're trying to prevent damage to the economy). The proposal is even harder to implement in some countries.
I guess the question would be how long respite would this give you before having to repeat.
Say we're going back completely to normal after the firebreak. Doubling times with the English strain were a little over a weak with the fairly strict December measures. With no measures say it speeds up to doubling every 4 days. This might be optimistic given how fast the original strain spread early in the pandemic.
If we have 10 cases that we've failed to eradicate then we get to 10,000,000 cases in about 11 weeks. So we have to repeat this 4 times a year?
Interested to hear now that these are now being provided in the to people in the UK who have COVID but are not in hospital and who are in a high risk category.
I note that there was some discussion on LW about how useful they were likely to be as people would probably notice difficulty in breathing which usually comes with low oxygen levels. It turns out that with COVID oxygen levels can get low without people noticing - this was mentioned later on LW (April).
Probably nowadays what Shorty missed was the difficulty in dealing with the energetic neutrons being created and associated radiation. Then associated maintenance costs etc and therefore price-competitiveness. I chose nuclear fusion purely because it was the most salient example of project-that-always-misses-its-deadlines.
(I did my university placement year in nuclear fusion research but still don't feel like I properly understand it! I'm pretty sure you're right though about temperature, pressure and control.)
In theory a steelman Shorty could have thought of all of these things but in practice it's hard to think of everything. I find myself in the weird position of agreeing with you but arguing in the opposite direction.
For a random large project X, which is more likely to be true:
Project X took longer than expert estimates because of failure to account for Y
Project X was delivered approximately on time
In general I suspect that it is the former (1). In that case the burden of evidence is on Shorty to show why project X is outside of the reference class of typical-large-projects and maybe in some subclass where accurate predictions of timelines are more achievable.
Maybe what is required is to justify TAI as being in the subclass
I think this is essentially the argument the OP is making in Analysis Part1?
I notice in the above I've probably gone beyond the original argument - the OP was arguing specifically against using the fact that natural systems have such properties to say that they're required. I'm talking about something more general - systems generally have more complexity than we realize. I think this is importantly different.
It may be the case that Longs' argument about brains having such properties is based on an intuition from the broader argument. I think that the OP is essentially correct in saying that adding examples from the human brain into the argument does little to make such an argument stronger (Analysis part 2).
(1) Although there is also the question of how much later counts as a failure of prediction. I guess Shorty is arguing for TAI in the next 20 years, Longs is arguing 50-100 years?
Flying machines are one example but can we choose other examples which would teach the opposite lesson?
Nuclear Fusion Power Generation
Longs: The only way we know sustained nuclear fusion can be achieved is in stars. If we are confined to things less big than the sun then sustaining nuclear fusion to produce power will be difficult and there are many unknown unknowns.
Shorty: The key parameters are temperature and pressure and then controlling the plasma. A Tokamak design should be sufficient to achieve this - if we lose control it just means we need stronger / better magnets.
I think this assumes the conclusion - it assumes that we know enough about intelligence to know what the key variables are and how effective they can be at compensating for other variables. Da Vinci could have argued how much more efficient his new designs were getting or how much better his new wings were but none of his designs could have worked no matter how much better he made them.
I don't disagree with you in general but I think the effect of Longs' argument should be to stretch out the probability distribution.
Also, the vaccine takes ~10 days to start having an effect. Plus say there is ~7 days delay from infection to test. 17 days ago Israel had vaccinated 6% so we wouldn't expect to see much effect in the case numbers yet.
On increased infectiousness of the UK strain, analysis of contact tracing data in the UK gives 30-50% more infectious (although looking at the data 30%-45% with central estimate of 35% is probably a better summary - see pages 14-16).
If I'm understanding the link correctly the 0.59 refers to the UK rate, the Denmark rate is 0.45. I'm also not sure whether the "0.45" and "0.59" are percentage increases or absolute increases in R - I think probably the latter (although if R=1 for the old strain as seems to be approximately true in Denmark then these are the same thing).
The paper he cites gives 72% increased infectiousness but with wide confidence interval:
The observed development in the occurrence of cluster B.1.1.7 in Denmark corresponds to an infection rate that is 72% (95% CI: [37, 115]%) higher than the average of other virus variants circulating in Denmark. (Google translate)
I suspect that this may come down as via regression towards the mean - region specific early data will bias towards regions where the new variant is growing fastest.
Am I being naive in thinking that most of the 50x comes from manufacturing the vaccine? In 1947 they had 650k vaccines ready to go, then got 7 pharma companies across the country to work round the clock to get them the rest. They were aiming to vaccinate 6.35 million, we are aiming to vaccinate 328 million just in the US (57x more doses needing to be manufactured). We'd expect to have more total capacity today of course but we have fewer companies doing the manufacturing.
I guess the mRNA vaccines are also more difficult to manufacture. Based on the cost the P/B is 5x more expensive than O/A viral vector. The latter is planning on producing more than twice as many in 2021 (I don't know how the size and number of facilities compare but Pfizer are the bigger company).
Say we were giving every dose procured by the US evenly spread across the country - how many would we be doing a day in NYC?
I have tried something similar but not with money (I find my kids aren't very motivated by money - not sure why). In our case the losing party usually has to formally acknowledge the victor with some silly phrase - "Dad is an amazing human / genius" or "Mark is a pro and I'm a noob". This doesn't allow for different odds (maybe I could tailor different phrases to achieve this?) although I will sometimes offer it without them being held to anything if I am sufficiently confident.
I do think there is some risk with this approach that the child will have a bad time just to get the money
I was worried about this too but similarly haven't actually experienced it - I don't think my kids have the willpower / concentration to keep this up for long enough!
I get the impression that the US response is best modelled on how much action individuals choose to take based on how scared they feel / how fed up they are with COVID restrictions.
In the UK I think people’s response is generally more directly linked to the government’s rules and guidance (with a fair bit of going slightly beyond the rules and a little bit of completely ignoring them).
In the latter case things can be put in place before the 60 day delay (for instance Scotland didn’t have many cases of the new strain but took drastic action despite that because they knew it would grow quickly). In the former case I think your description here is a good model of the response - we could slow it down by reacting early but we probably won’t.
For people not in England it probably helps to say that restrictions were loosened somewhat at the beginning of December (following 3 weeks of stricter lockdown which ended on the 3rd IIRC) and are being tightened significantly today (26th). Worst affected locations were already tightened on the 20th.
The November lockdown was probably marginally stronger than the one now in place but comparing end of November drop rates to whatever happens over the next few weeks will probably be a good indicator as the December growth rates are confounded by different levels of lockdown and results being a mixture of the two strains.
The Berry-Essen theorem uses Kolmogorov-Smirnov distance to measure similarity to Gaussian - what’s the maximum difference between the CDF of the two distributions across all values of x?
As this measure is on absolute difference rather than fractional difference it doesn’t really care about the tails and so skew is the main thing stopping this measure approaching Gaussian. In this case the theorem says error reduces with root n.
From other comments it seems skew isn’t the best measure for getting kurtosis similar to a Gaussian, rather kurtosis (and variance) of the initial function(s) is a better predictor and skew only effects it inasmuch as skew and kurtosis/variance are correlated.
So my understanding then would be that initial skew tells you how fast you will approach the skew of a Gaussian (i.e. 0) and initial kurtosis tells you how fast you approach the kurtosis of a Gaussian (I.e. 3)?
Using my calibrated eyeball it looks like each time you convolve a function with itself the kurtosis moves half of the distance to 3. If this is true (or close to true) and if there is a similar rule for skew then that would seem super useful.
I do have some experience in distributions where kurtosis is very important. For one example I initially was modelling to a normal distribution but found as more data became available that I was better to replace that with a logistic distribution with thicker tails. This can be very important for analysing safety critical components where the tail of the distribution is key.
The graph showing Kurtosis vs convolutions for the 5 distributions could be interpreted as showing that distributions with higher initial kurtosis take longer to tend towards normal. Can you elaborate why initial skew is a better indicator than initial kurtosis?
The skew vs kurtosis graph suggests that there’s possibly a sweet spot for skew of about 0.25 which enables faster approach to normality than 0. I guess this isn’t real but it adds to my confusion above.
I think there was some talk after last year about adding a "endorse nomination" button so that not everyone had to write their own comment to provide a nomination if they just agreed with what someone else had already written. Is this available / planned?
I did an analysis of how convincing the Oxford-AstraZeneca claim of 90% effectiveness is.
Unfortunately I inferred the numbers of infections in each group incorrectly according to this - the infections were split 3:27 between the half-full group and the full-full group, not 2:28 as I'd calculated. (Note that the naive interpretation of the numbers doesn't come to 90% or 62% effectiveness so I assume they're doing some corrections or something else which alters the result slightly.)
That means the 8:1 Bayes factor I originally calculated (in favour of half-full being more effective vs the two different regimens being equally effective) comes down to 2.9:1. In my book that isn't enough evidence to overcome the prior against the half-full dose regimen being more effective.
The above assumes that everything else about the groups is equal.
Having read the report linked in the OP I think the actual update should be noticeably lower, particularly as the half-full treatment group were younger than the full-full treatment group (or at least only the latter included anyone >55 years old).
(I mean which interpretation will the evidence favor, not on whether they go ahead with the half-full as the standard dose)
FWIW I suspect I personally need this advice more than alkjash's advice. I've always had a feeling that most people are doing it wrong (e.g. managers who are always working late instead of learning to delegate) but I'm conscious that I want to be better at committing to things and seeing them through even if they're hard (or just inconvenient!).
I was confused as to why they did this too - alternative guesses I had were to increase number of available doses or to decrease side effect severity.
However the site you link to has been updated with a link to Reuters who quote AstraZeneca saying it was an accident - they miscalculated and only noticed when side effects were smaller than predicted.
(numbers from here and here, numbers inferred are marked with an asterisk)
Some interesting results from the latest vaccine trial. The treatment group was split in two, one of which received 2 full doses of the vaccine, the other received a half dose followed by a full dose (separated by a month in both cases).
In the control group there were 101 COVID infections in ~11,600* participants.
With 2 full doses there were 28* infections in 8,895 participants.
In the half first dose condition there were 2* infections in 2,741 participants.
So does having a low first dose actually improve the immune response?
The best I can figure out the evidence is 8:1 Bayes factor in favour of the "low first dose better" hypothesis vs the "It doesn't matter either way" hypothesis.
Not sure what a reasonable prior is on this but I don't think this puts the "low first dose is better" hypothesis as a strong winner on posterior.
On the other hand the evidence is 14:1 in favour of "it doesn't matter either way" vs "both full doses is better" so at the moment it probably makes sense to give a half dose first and have more doses in total.
I'll be interested as to what happens as the data matures - the above is apparently based on protection levels 2 weeks after receiving the second dose.
I enjoyed this and had the same experience that when I was taught CLT referring to random variables I didn’t have a proper intuition for it but when I thought about it in terms of convolutions it made it a lot clearer. Looking forward to the next post.
One interesting graph would be the average points gained per matchup vs round number which would give a good indication of cooperation level and what kinds of strategies would work. It can kind of be inferred from the bots which are left but seeing a graph would make it easier to picture.