I had nudging cached in my memory as, more or less, a UX movement.
Want to increase charity donation at your company? Make it opt-out, rather than opt-in. Want to increase completion rates of your survey? Make it shorter.
And so forth.
So I was surprised by Jacob Falkovich claiming that nudgerism caused the elaborate psychological theorising used to inform covid policy. Many such policies mostly seemed to be about oddly specific, second-order claims. Like, in the case of expected resistance to challenge trials, or vaccine hesitancy. Those arguments venture more heavily into psychoanalysing people; rather than appealing to simple behavioural economics and basic UX.
(My cached memory of the nudge movement might be too narrow, though)
Habryka, is the reasoning that politicians have a real incentive to accurately predict public response -- because it entirely determines whether they remain in power -- whereas behavioral scientists have a much weaker incentive, compared to the dominant incentive of publishing significant results?
I haven't looked at the links, but making problem lists like this seems really cool. I'm glad they tried it, and then followed up.
I'm curious whether you know anything about why they tried it?
Hamming's original lecture talks about how most scientists he had lunch with sort of flinched away from their field's Hamming problems. He asked why they weren't working on them. It's implied that the conversation usually didn't go down very well, and the next day he had to eat lunch with someone else.
Why were things different for the Accounts of Chemical Research people? Unusual amounts of curiosity, courage, accident, or something else?
Comment by jacobjacob on [deleted post]
There is an argument that the use of willpower is undesirable.
I went down the neoantigen rabbithole, and it was quite interesting.
I liked this talk on "Developing Personalized Neoantigen-Based Cancer Vaccines".
It seems a core part of their methodology is using machine learning to predict which peptides will elicit a T-cell response, based on sequencing the patient's tumour. (Discussed starting from around 11 minutes in.)
They use this algorithm, which seems to be a neural network with a single hidden layer just ~60 neurons wide, and some amount of handcrafting of input features (based on papers from 2003 and 2009). I wonder what one could accomplish with more modern tools (though I haven't yet read the papers deeply enough to have a model of how big of a bottleneck this is to creating an effective treatment, and how much room for improvement there is).
I'm updating fairly hard on the four radvac team members who found antibodies using custom-built ELISA assays (rather than commercial tests). I wasn't super compelled by arguments that those might be false positives, but I do find it important that we don't know the denominator off how many of them took that test.
It maybe moved my probability from 17% to 45% that it would work for me (so still less optimistic than Wentworth!)
Though I think even a 5% chance of it working would make the original question worth asking. As they say: huge if true :)
(Also, the more competent version of me who solved it in a month would need to be competent on many other dimensions as well, not just knowing about peptide vaccines. Thinking about it, just the peptide delivery time could be longer than a month, as could the vaccine booster schedule. I do think there are worlds where it's actually a month, but I'll update the question to say "a few")
This actually flies against my sense that Bell Labs was able to build the transistor because of their resources and build-up of particular knowledge and expertise they had after 20-years. Possibly their ideas were just getting spread around via their external contacts, or actually, solid-state physics was taking off generally.
Woah, this was striking to me. It seems like pretty big evidence against Bell Labs actually having a secret sauce of enabling intellectual progress. I would have to look into it more, though. (Also the update is tempered by the fact that another argument for Bell Labs' greatness is the sheer number of inventions, like UNIX, satellites, lasers, information theory, and other stuff.)
Well, this post was just crying out for some embedded predictions! So here we go:
Thanks johnswentworth for help with some of the operationalisations!
I included many different ones, as I think it is often good try to triangulate high stakes questions via different operationalisations. This reduces some some "edge-case noise" stemming from answering vague questions in overly specific ways.
Yep, this is indeed a reason proper scoring rules don't remain proper if 1) you only have a small sample size of questions, and 2) utility of winning is not linear in the points you obtain (for example, if you really care about being in the top 3, much more than any particular amount of points).
Some people have debated whether it was happening in the Good Judgement tournaments. If so, that might explain why extremizing algorithms improved performance. (Though I recall not being convinced that it was actually happening there). When Metaculus ran its crypto competition a few years ago they also did some analysis to check if this phenomenon was present, yet they couldn't detect it.
And in doing so, I feel proud to assume the role of Patron Saint of LessWrong Challenges, and All Those Who Test Their Art Against the Territory.
Some reasons I'm excited about this post:
1) Challenges help make LessWrong more grounded, and build better feedback loops for actually testing our rationality. I wrote more about this in my curation notice for The Darwin Game challenge, and wrote about it in the various posts of my own Babble Challenge sequence.
2) It was competently executed and analysed. There were nice control groups used; the choice of scoring rule was thought through (as well as including what would've been the results of other scoring rules); the data was analysed in a bunch of different ways which managed to be both comprehensive while at the same time maintaining my curiosity and being very readable.
Furthermore, I can imagine versions of this challenge that would either feel butchered, in such a way that I felt like I didn't learn anything from reading the results, or needlessly long and pedantic, in such a way that getting the insight wouldn't have been worth the trek for most people. Not so with this one. Excellent job, UnexpectedValues.
3) I want to celebrate the efforts of the participants, some of whom devised and implemented some wonderful strategies. The turtle graphic fingerprints, gzip checks, mean-deviation scatter, and many others were really neat. Kudos to all who joined, and especially the winners, Jenny, Reed, Eric, Scy, William, Ben, Simon, Adam and Viktor!
I would love to see more activities like these on LessWrong. If you want to run one and would like help with marketing, funding for prizes, or just general feedback -- do send me a message!
Congratulations on your first LessWrong post! :) (Well, almost first)
As a piece of feedback, I will note that I found the "Rosenberg's crux" section pretty hard to read, because it was quite dense.
I feel like if I would've have read the original letter exchange, I could then have turned to this post, and gone "a-ha!" In other words, it felt like a useful summary, but didn't give me the original generators/models, such that I could pass the intellectual Turing test of what Dennett and Rosenberg actually believe.
By comparison, I think the section on the "cryptographer's constraint" was clearer; since it was more focused on elaborating on a particular principle and why it was important, along with considering some concrete examples more in depth.
The forecasters were only quite loosely selected for "some forecasting experience". Some of them I know are very able forecasters, others are people much less experienced, and who I don't think are affiliated that much with the rationality or effective altruism communities.
I have a beginning draft of a survey for the Secret of Our Success. I hoped I could finish it up yesterday, but instead I had work on shipping the LessWrong Books. Will see if I can get it out later this week.
Have at least one 2h conversation about a particular post, and write up a review after, almost regardless of how I feel the conversation went
Didn't happen and didn't really come close.
My main post-mortem is that I had multiple calendar reminders about the commitment, but for all of them I postponed them into the future. Until it was the last weekend and I was out of time. I should've spent more meta-cognition during some of them, thinking about how much time I would need to complete the tasks on time.
Author here: I think this post could use a bunch of improvements. It spends a bunch of time on tangential things (e.g. the discussion of Inadequacy and why this doesn't come through in textbooks, spending a while initially setting up a view to then tear down).
But really what would be nice is to have it do a much better job at delivering the core insight. This is currently just done in two bullets + one exercise for the reader.
Even more important would be to include JenniferRM's comment which adds a core mechanism (something like "cultural learning").
Overall, though, I still stand by the importance of the underlying concept; and think it's a crucial part of the toolkit required to apply economic thinking in practice.
Formulations are basically just lifted from the post verbatim, so the response might be some evidence that it would be good to rework the post a bit before people vote on it.
I thought a bit about how to turn Katja's core claim into a poll question, but didn't come up with any great ideas. Suggestions welcome.
As for whether the claims are true or not --
The "broken parts" argument is one counter-argument.
But another is that it matters a lot what learning algorithm you use. Someone doing deliberate practice (in a field where that's possible) will vastly outperform someone who just does "guessing and checking", or who Goodharts very hard on short-term metrics.
Maybe you'd class that under "background knowledge"? Or maybe the claim is that, modulo broken parts, motivation, and background knowledge, different people can meta-learn the same effective learning strategies?
I made some prediction questions for this, and as of January 9th, there interestingly seems to be some disagreement with the author on these.
Would definitely be curious for some discussion between Matthew and some of the people with low-ish predictions. Or perhaps for Matthew to clarify the argument made on these points, and see if that changes people's minds.
I experimented with extracting some of the core claims from this post into polls:
Personally, I find that answering polls like these make me more of a "quest participant" than a passive reader. They provide a nice "think for yourself" prompt, that then makes me look at the essay with a more active mindset. But others might have different experiences, feel free to provide feedback on how it worked for you.
(You can find a list of all 2019 Review poll questions here.)
I can't quite tell why you think Twitch is bad. It is subject to network effects, kind of a social media company, is that why? And I don't know what Scale.com is other than some AI company.
Scale's mission is something like accelerating AI progress, and they have no safety department. So ¯\_(ツ)_/¯ For Twitch I think a bunch of good stuff happens there (chess streamers, Ed Kmett streaming Haskell, or just great gamers), but they're also in a domain where clickbait and similar Goodharting dynamics are strong, and in the worlds where it gets really big I expect those to dominate.
On the topic of tails, I wonder if your distribution would've come out differently had the scale been -10, -1, 0, 1, 10.
I think I would rarely have assigned 10s, due to it being a complex question and this just being a very rough draft.
Another interesting question is whether weighing the rankings by market cap would have made a difference. (But YC didn't make valuations available in their data, so it would require ~30 min of data entry.)
I wrote up a longer, conceptual review. But I also did a brief data collection, which I'll post here as others might like to build on or go through a similar exercise.
In 2019 YC released a list of their top 100 portfolio companies ranked by valuation and exit size, where applicable.
So I went through the top 50 companies on this list, and gave each company a ranking ranging from -2 for "Very approval-extracting" to 2 for "Very production-oriented".
To decide on that number, I asked myself questions like "Would growth of this company seem cancerous?" and "Would I reflectively endorse using this product?"
Companies that scored highly include Doordash, Dropbox and Gusto (all 2's), and companies that score low include Scale.com (which builds tooling to speed up AI research) and Twitch (-2 and -1).
For comparison, I also did the same exercise with the top 50 S&P500 companies by market cap, with high-scoring ones including Microsoft and Visa, and low-scoring ones including Coca Cola and Salesforce.
This scale is Very Made-up and Maybe Useless. But, if nothing else, it seemed like a useful way to get grounded in some data before thinking further about the post.
Overall, the distributions ended up very similar, though YC did come out with a higher mean, mostly driven by fewer negative tail companies.
I was the lead designer on making these graphs, and I found this feedback pretty useful. Thanks!
Most of my response is "I'm a beginner, I'm still learning, and trying to ship things fast means making a bunch of sacrifices. Boy could I point out even worse things that would drive you crazy not being able to unsee them!" :)
One reason for these (e.g. 4) is that the graphs were first plotted using Vega-lite in ObservableHQ, and then exported and retouched in Adobe Illustrator.
(FWIW; I've been working out 4-5x per week for the last months (from home), and cut out all fast food/candy/folk-nutrition-bad-seeming-foods for the same period. I have a very solid routine down and am at no risk of procrastinating. The major failure mode for me right now is plateauing or injury. In fact, the majority people I know who have had a gym habit seem to have plateaued.)
FWIW, I just posted [a new challenge-like thing](https://www.lesswrong.com/posts/5HTaBuxRyRSc4mHnP/thread-for-making-2019-review-accountability-commitments), and following your feedback, among others, I tried making the stakes and norms clearer upfront, and be more explicit about what people are opting in to.
I'll try to elucidate the standards underlying my judgement call:
You admitting it was due to an "ugh" field, and me being worried about my dojo-master-decisions making it seem generally acceptable to shy away from your ugh fields. (Which, well, sometimes it certainly is. They're there for a reason. But not always, and I wanted to build a space where people could confront challenges)
Not conforming to the experiment of "having babble be used to solve one very particular problem in your life". Instead doing something which seemed more theoretical, and less likely to yield creative solutions to one specific problem. The answered seemed to me in the spirit of babble, but not in the spirit of this week's challenge. (One might also say that I wanted depth-first and you did a breadth-first search... but I'm not sure that metaphor really holds up)
I don't think I had made it at all explicit or clear to everyone involved that these were actually the standards. But they are the ones I abode by, nonetheless.
Comment by jacobjacob on [deleted post]
Maybe a tag-split would be in order. I think the actual technical, economic field of Mechanism Design gets discussed a bunch of LW. I'm not a huge fan of "Institution design" as a name since it's not actually an established name for a field (I think?), but it might have slightly different connotations.