I did an analysis of how convincing the Oxford-AstraZeneca claim of 90% effectiveness is.
Unfortunately I inferred the numbers of infections in each group incorrectly according to this - the infections were split 3:27 between the half-full group and the full-full group, not 2:28 as I'd calculated. (Note that the naive interpretation of the numbers doesn't come to 90% or 62% effectiveness so I assume they're doing some corrections or something else which alters the result slightly.)
That means the 8:1 Bayes factor I originally calculated (in favour of half-full being more effective vs the two different regimens being equally effective) comes down to 2.9:1. In my book that isn't enough evidence to overcome the prior against the half-full dose regimen being more effective.
The above assumes that everything else about the groups is equal.
Having read the report linked in the OP I think the actual update should be noticeably lower, particularly as the half-full treatment group were younger than the full-full treatment group (or at least only the latter included anyone >55 years old).
(I mean which interpretation will the evidence favor, not on whether they go ahead with the half-full as the standard dose)
FWIW I suspect I personally need this advice more than alkjash's advice. I've always had a feeling that most people are doing it wrong (e.g. managers who are always working late instead of learning to delegate) but I'm conscious that I want to be better at committing to things and seeing them through even if they're hard (or just inconvenient!).
I was confused as to why they did this too - alternative guesses I had were to increase number of available doses or to decrease side effect severity.
However the site you link to has been updated with a link to Reuters who quote AstraZeneca saying it was an accident - they miscalculated and only noticed when side effects were smaller than predicted.
(numbers from here and here, numbers inferred are marked with an asterisk)
Some interesting results from the latest vaccine trial. The treatment group was split in two, one of which received 2 full doses of the vaccine, the other received a half dose followed by a full dose (separated by a month in both cases).
In the control group there were 101 COVID infections in ~11,600* participants.
With 2 full doses there were 28* infections in 8,895 participants.
In the half first dose condition there were 2* infections in 2,741 participants.
So does having a low first dose actually improve the immune response?
The best I can figure out the evidence is 8:1 Bayes factor in favour of the "low first dose better" hypothesis vs the "It doesn't matter either way" hypothesis.
Not sure what a reasonable prior is on this but I don't think this puts the "low first dose is better" hypothesis as a strong winner on posterior.
On the other hand the evidence is 14:1 in favour of "it doesn't matter either way" vs "both full doses is better" so at the moment it probably makes sense to give a half dose first and have more doses in total.
I'll be interested as to what happens as the data matures - the above is apparently based on protection levels 2 weeks after receiving the second dose.
I enjoyed this and had the same experience that when I was taught CLT referring to random variables I didn’t have a proper intuition for it but when I thought about it in terms of convolutions it made it a lot clearer. Looking forward to the next post.
One interesting graph would be the average points gained per matchup vs round number which would give a good indication of cooperation level and what kinds of strategies would work. It can kind of be inferred from the bots which are left but seeing a graph would make it easier to picture.
I have a theory that ASTB might end up helping EBMB somewhat too as some of ASTB's mass goes to the clones which keeps them alive for longer for EBMB to feed off. Still most should go straight to MB so you'll get the bigger boost, just probably not enough.
This norm does seem right to me but it is probably worth noting the asymmetry in audience between a typical personal blog post and these COVID updates. I feel like Zvi has earned the right to do this if he wants but would personally prefer the topics to be separated into different posts.
If someone decided they wanted to fund such a project how much would you estimate it would cost? (Let's say based on the assumption it didn't have to be publication quality, just good enough to persuade you it was likely correct)
I did a little digging on schools staying open in Europe and I suspect the decision to keep schools open in Europe has been partly driven by this paper from the ECDC (dated 6th August) which suggests the attack rate in schools is low.
They base this on:
Some contact tracing investigations (I think ~6 total) from schools, most of which showed at most 1 onward infection. However they note one study of an Israeli school where 2 index cases ended up being 178 cases in the school and more in the community.
Countries which reopened schools without noticeably increasing R and without significant school outbreaks
I do think there is some bias towards wanting to keep schools open (I think they downplay a couple of papers which suggest schools could be causing more infections) but actually the evidence is better than I expected. Of course they could be leaving out other studies completely and I wouldn't know.
From the paper they are actually planning on trying this within a Fortune 100 company so this at least must be allowable.
To that end, we have reached an agreement with a Fortune 100 company to demonstrate the value of our tool as part of their COVID-19 management practices. As we have shown there are cultural and age differences in coughs, future work could focus on tailoring the model to different age groups and regions of the world using the metadata captured, something we would like to test at the company site.
I think this is true if you're looking for near-perfect scientists but if you're assessing current science to decide who to invest in there are lots of things you can do to get better at performing such assessments (e.g. here).
In my 2020 predictions I mentioned that I found the calibration buckets used on e.g. SSC (50%, 60%, 70%, 80%, 90% and 95%) difficult to work with at the top end as there is a large difference in odds ratio between adjacent buckets (2.25 between 80% and 90%, 2.11 between 90% and 95%). This means that when I want to say 85% both buckets are a decent way off.
I suggested at the time using 50%, 65%, 75%, 85%, 91% and 95% to keep the ratios between buckets fairly similar across the range (maximum 1.89) and to work with relatively nice round numbers.
Alternatively I suggested not having a 50% bucket as answers here don't help towards measuring calibration and you could further reduce the gaps between buckets without increasing the number of buckets.
At the time I couldn't come up with nice round percentage values which would keep the ratios similar. The best numbers I got were 57%, 69%, 79%, 87%, 92%, 95% (max difference of 1.78) which seemed hard to work with as they're difficult to remember.
An alternative scheme I've come up with is not to use percentage values but to use odds ratios. The buckets would be:
The percentage equivalents are similar to the scheme mentioned previously with the same max difference between buckets. I prefer this as it has a simple pattern to remember and adjacent buckets are easy to compare (e.g. for every 3 times X doesn't occur, would I expect X to occur 7 or 12 times?).
I've tried this out and found it nice to work with (not initially but after getting used to it) but that may just be a personal thing.
Reading it a bit more carefully, I guess for one-sided bets there's a chance that you are already in the position that the bet is not profitable so you already don't need to update. I guess the title threw me off a bit - with two sided bets you have to do one or the other (or both), with one sided you don't.
Hmm, that paper references another paper for its >50% infected claim but the paper it references only has a 24% seropositivity rate. It does suggest 53.5% infection but that's based on a naive SIR model which I don't expect to give particularly accurate results for that kind of thing.
Good to see a detailed examination of reinfections though - that's the kind of thing I've been hoping to see.
Another, more recent, paper does find 66% seropositivity in migrant workers (who make up 60% of the population). However the sample seems to have been selected strongly for people who had had Covid as 20% of the sample had already had positive PCR results, compared to ~4% of the total population.
My confusion is that I don't think the explanation in this post really resolves why CliqueZviBot outscored the clique bot average (multiplicatively) in every round last time out. This should only happen 1210=11024 times for a given bot or 1128 times given 8 bots.
To be fair this is only me reading it off the graphs so I could be wrong with this exact analysis but I do think CliqueZviBot was a very strong and consistent outlier and that this explanation doesn't resolve my confusion as to why.
I love your "Lucy pulling away the football" ideas and its super impressive that this can outscore tit-for-tat.
I'm fairly sure the first idea is better than the last two if you want to play it even with a large number of clique bots as what matters is not how many times more you're getting than them but how many times more they are losing than you (from the normal 2.5 per round). Idea one forces them to lose 3x more than you and this means its worth it if there are fewer than 3x more clique bots than mimic bots (ignoring all other bots). If DefinitelyNotCollusionBot and PasswordBot played the same tactic against the clique bots it may have been worth it even with 8 clique bots.
I think that both effects are likely and that this will add noise to the measure.
Noise is my main concern about your experiment in general - with only 10 samples in each treatment any effect would have to be large to reliably show up. If you were doing a t-test with p<0.05 then you would need Cohen's d of 1.3 to get a significant result. This would be the equivalent of PHTG having the effect of moving a median date to a 90th percentile date which feels unlikely.
Obviously you're being sensible and not being frequentist but the underlying problem is still there - even if PHTG has a decent sized effect the experiment might not show it, or, worse, if PHTG makes things slightly worse it could show up as being good in your experiment.
I would suggest that you try to work out a power calculation (even just by setting up your planned calculation and plugging in some fake numbers to see what happens) - if PHTG is slightly harmful to your chances (say 20% decreased chance of getting a second date) what are the odds that the experiment will lead you to accept PHTG?
As an aside, have you read this on putanumonit? It presents an alternative to PHTG which you might find interesting.
If PHTG is successful, do you expect more or less eye contact from your dates? If PHTG raises your status this implies it puts your date at relatively lower status and according to your list they would make less eye contact. However your post suggests that you'll take it as a sign of interest as is usually the case.
So I notice that I'm still confused as to how CliqueZviBot outperformed the other clique bots so consistently in the previous game (and still is! At least for this one round). I assume that the 300-200 goes in the favour of whoever won the starting tiebreaker. No bot should be able to consistentely win this?
I have been working on this on and off for a couple of years as a potential career switch but akrasia is a big barrier so most of my suggestions are to combat this.
I found this a lot harder but I do think the practice with the previous exercises helped me on this one to search wider for solutions, especially as I got nearer the end. Some are things I've already done/started but could do more/better.
Schedule fixed times each week
Reminder on phone
Get someone to ask me how it’s going every week
Join learning group
Join and do Kaggle projects
Look for alternatives to Kaggle
Find a mentor
Sign up for a course
Sign up for more expensive courses to motivate me
Beeminder or something
Ask for suggestions on courses
Read up about Akrasia in general
Look up jobs and requirements
Find low effort content to combine with high effort
Combine with current job
Look to get reassigned to move more towards field
Look for opportunities within current job to use skills
Break down into smaller tasks
Make full list of everything I want to learn, break it into manageable chunks
Delete games / other distractions from phone
Turn off all push notifications
Timelock phone for times I should be learning
Stop being lazy
Just give up (cut my losses)
Choose something else to learn
Be more direct in what I actually need to learn instead of what is suggested in courses
Focus on the bits I actually enjoy
Take a couple of weeks break guilt-free, come back refreshed
Play music whilst working
Look if anyone has made Anki Decks
Read data science blogs
Watch data science youtube stuff
Don’t do too much at once (avoid burnout)
Do lots all at once (Do it when I have the motivation)
Take more regular breaks whilst doing it
Just start and see if I feel like it once I’ve been doing it for 10 minutes
Get more comfy work chair
Get better equipment
Focus on the goal
Get more sleep for better energy
Reward myself after I’ve done something
Talk about my successes with people
Write up what I’ve learnt
Create a fun exercise:
Ex1: Robot raspberry pi
Ex2: Checkers learner
Ex3: Minecraft mod
Ex4: Look up fun projects online
Ex5: Write bot for Darwin game (and then fail to submit it…)
I think compound returns is the wrong model as it stands - logarithmic growth seems more appropriate with the current setup. I would expect completing 5 babble challenges to give 80-90% of the benefit of doing 7.
If we practice both babble and prune then the benefits of the two probably do compound somewhat with each other such that doing 2 babble and 2 prune is significantly better than doing 4 of either but this doesn’t really justify streak measuring.
If consistency rather than direct benefit is the target then streaks make some sense. I would say in that case that I would need to be persuaded that this is the correct exercise to learn consistency. At the moment I would categorise it as definitely worthwhile (hence the 3 out of 4 commitment) but not enough to super-prioritise it enough to make a streak-worthy commitment.
I hereby commit to doing at least 3 of the remaining 4 of these (I don't know if I'll have time every week).
I don't like measuring things by streaks - if you want to do a list I think doing it by total number of challenges completed is better. Streaks are a less accurate indication of effort put in or potential gains achieved and have more potential to create unhealthy incentives.
(I think this instinct comes from something like Noticing the Taste of Lotus, although I'm not really sure how strongly it applies here)
Change appearance of pen (different colour?) as long as it is reversible
Inside a clock
Ways to make sure it gets to him if I die:
Start a conspiracy to keep it
Publicise to world what is happening, bring the evil forces out into the light
Hide it at the end of a treasure hunt with each clue more fiendish than the last. If you die, make sure Einstein gets the first clue, he can take it from there
Hide it in the Bern patent office in a packet with his name on it
Write a set of instructions on how to get the pen, give letter to Western Union to guarantee delivery at the correct time
Go to Colorado, hide addressed envelope with Brachiosaurus fossils and re-cover the fossils, ready for them to be dug up in 1903.
Ways to confound evil forces:
Learn how to lose a tail when on my way to hide the pen
Pretend to hide it in multiple places
Pretend to give it to multiple people so they don’t know who to follow
Go all-out attack on the evil forces
Put booby traps around hiding place
Put booby traps elsewhere to misdirect
Make multiple indistinguishable copies and hide all of them
Ensure evil forces don’t know I ever had it
If unsuccessful, give it to someone else, commit suicide so they can’t find out who I gave it to
Don’t worry about it, he can probably use whatever pen and still get the same result
Otherwise, use magic pen to try to write miracle papers myself
Buy lots of identical pens which will work just as well
Reverse engineer pen if no similar pens available
Apply magical foresight to foresee all attempts at theft and prevent them
Rely on determinism – apparently I already know he will write with this pen
Use my knowledge of modern technology to make loads of money to help with effort
Hide it in one of infinitely many sets each of which contains infinitely many elements, await for Ernst Zermelo to formulate his axiom of choice (1904) to allow me to arbitrarily pick it from the relevant set
I'm happy to see the effect of stomach acid on metal has been studiedscientifically. I didn't really expect this to work but for thin metal it would be surprisingly effective (63% mass reduction of razor blade after 24 hours). Given 10 years...
Global yearly deaths on roads are ~1.35mil (source).
In the US (well, 23 states) there was a 6% drop in the first 6 months of the year (source - see table halfway down). Naive approximation gives 81k lives saved globally if this turns out to be the yearly average.
Alternative calculation: In the max lockdown month(s) deaths were down ~40-70% (Various European countries, Turkey, UK). Assuming 2 months of severe lockdown this would give 124k lives saved.
Cause overheat in phone battery to explode way out
Chip away at wall with phone
Upload consciousness to phone, escape via wifi
Wrap shirt around fist and punch way out
Use wifi to request help from police
Use wifi to request help from friends
Use wifi to order package to own location, hope delivery man is resourceful
Dig out through floor
Craft phone into lock pick
Kick door in
Use phone part to unscrew hinges
Remove door handle etc to get to lock mechanism
Use steel from boot toecap to dig through wall
Smash head against wall
Are there lights in the room? If there are I’ll use them somehow
Jump high enough to smash through ceiling
Wait until someone opens the door
Do lots of exercise to become super strong, then smash out
Escape in my dreams
Examine floor, wall, ceiling, use strongest material to smash weakest
Found mega successful company via wifi, use profits to fund search and rescue for me
Become ace hacker, use global surveillance to find location and organise rescue
Ask for ideas on how I should escape in a LW post where I pretend that this is a hypothetical question intended to promote creativity
Use phone part to create laser to bore way out (research this on wifi!)
Shout for help
Boost wifi signal to get attention of anyone passing
Increase power to phone speakers to create sonic wave to knock down walls
Use phone speaker to create resonant frequency for wall to vibrate it down
Find out how air is being replenished – look for weak points
Investigate what weird physical/chemical effect is enabling me and my phone to store so much energy – use this new discovery to power my way out
Plead nicely with the people on the other side of the door
Offer bribes to any guards
Just unlock the door – maybe it doesn’t need a key
If room is small enough, climb up walls by wedging between opposite walls. Escape through ceiling
Chip hand/footholes in the walls to escape through ceiling
Wait for lock/hinges to rust, encourage this by breathing on them
Use phone battery acid to etch way out (can’t believe it took me this long for that one)
Force self to be sick, use stomach acid similarly
Use clothes/poop to create fire (light via phone spark). Burn any vulnerable parts of lock/door
Scrape out using finger/toenails (I have 10 years worth of nails)
If I find myself in this situation I hereby pre-commit myself to using all of my available resources not to escape but to reign down hellfire remotely on whoever put me in there (avoid getting put in this situation in the first place)
Realise that as an introvert I may actually be in a close approximation of heaven and live out the rest of my life in peace
Use zoom/whatsapp to virtually escape, even if physical body is still stuck
Place myself in suspended animation and await rescue
Use belt buckle to dig way out
Try the doorhandle – it is definitely locked, right?
Use glasses to focus light from phone torch to melt way out
Rub hands together until sore, use saw to cut phone in half, put 2 halves together to make whole, climb out through the hole
Meta note: Spoilers don’t function correctly on compressed comments on the front page so you get to see the first few words. Generally this isn’t a problem but I can imagine there would be times that it would.
I don’t know how many of the 2000 would do the same thing but switching to GW for the day was fairly obvious to me. On the other hand I use GW on and off so this maybe gave me an advantage but I think the post on surviving the outage suggested doing that too. Short of checking GW traffic I guess it’s hard to know how many did this.
We often distinguish between safety critical and non-safety critical components. The latter make up about 95% of components in my business and in general the thing we care most about is average performance.
In safety critical components we care about the worst component (material / manufacturing defect etc.) in e.g. 1,000,000. Otherwise >1 in 1,000,000 brakes fail and the vehicle runs someone over or drives into a canal.
The examples that you give of jokey but serious things are almost all non-safety critical things (except the dominance contest but I think that's quite a different example). If I miss that embedded agency is about something serious then that doesn't really matter - someone who makes that mistake is probably not really who it is important to make understand. The overall effect of the series is the most important thing.
My impression is that the message you sent is great for average performance (and that the most natural way to read it is as you intended) but that it isn't optimised for communicating with the biggest exception in 270. The person who shares the least common knowledge about the ritual or reads the message the fastest or has the prior you mention or a prior that the admins sometimes do pure jokes (e.g. April Fools) or whatever - that person is really the person you need to be writing for. That most people understood it correctly is largely irrelevant.
I feel like this is a huge lesson that this experience hammered home for me.
(The message changed slightly from last year to this - one section I particularly note is:
You’ve all been given the opportunity to show yourselves capable and trustworthy.
in 2019 was changed in 2020 to:
You've all been given the opportunity to not destroy LessWrong.
I'd be curious to know why this was changed as the former seems better optimised for setting expectations.)