Posts
Comments
A "Core Views on AI Safety" post is now available at https://www.anthropic.com/index/core-views-on-ai-safety
(Linkpost for that is here: https://www.lesswrong.com/posts/xhKr5KtvdJRssMeJ3/anthropic-s-core-views-on-ai-safety.)
I’ve run Hamming circles within CFAR contexts a few times, and once outside. Tips from outside:
Timing can be tricky here! If you do 4x 20m with breaks, and you’re doing this in an evening, then by the time you get to the last person, people might be tired.
Especially so if you started with the Hamming Questions worksheet exercise (link as prereq at top of post).
I think next time I would drop to 15 each, and keep the worksheet.
Thanks for the writeup! The first paper covers the first half of the video series, more or less. I've been working on a second paper which will focus primarily on the induction bump phenomenon (and other things described in the second half of the video series), so much more to come there!
I appreciate the concept of "Numerical-Emotional Literacy". In fact, this is what I personally think/feel the "rationalist project" should be. To the extent I am a "rationalist" then precisely specifically what I mean by that is that knowing what I value, and pursuing numerical-emotional literacy around it, is important to me.
To make in-line adjustments, grab a copy of the spreadsheet (https://www.microcovid.org/spreadsheet) and do anything you like to it!
Also, if you live alone and don't have any set agreements with anyone else, then the "budgeting" lens is sort of just a useful tool to guide thinking. Absent pod agreements, as an individual decisionmaker, you should just spend uCoV when it's worth the tradeoff, and not when it's not.
You could think about it as an "annualized" risk, more than an "annual" risk; more like "192 points per week, in a typical week, on average" and it kind of amortizes out, and less like "you have 10k and once you spend it you're done"
There is now a wired article about this tool and the process of creating it: https://www.wired.com/story/group-house-covid-risk-points/
I think the reporter did a great job of capturing what an "SF group house" is like and how to live a kind of "high IQ / high EQ" rationalist-inspired live, so this might be a thing one could send to friends/family about "how we do things".
It's not just Dario, it's a larger subset of OpenAI splitting off: "He and a handful of OpenAI colleagues are planning a new project, which they tell us will probably focus less on product development and more on research. We support their move and we’re grateful for the time we’ve spent working together."
I heard someone wanted to know about usage statistics for the microcovid.org calculator. Here they are!
Sorry to leave you hanging for so long Richard! This is the reason why in the calculator we ask about "number of people typically near you at a given time" for the duration of the event. (You can also think of this as a proxy for "density of people packed into the room".) No reports like that that I'm aware of, alas!
Want to just give credit to all the non-rationalist coauthors of microcovid.org! (7 non-rationalists and 2 "half-rationalists"?)
I've learned a LOT about the incredible power of trusted collaborations between "hardcore epistemics" folks and much more pragmatic folks with other skillsets (writing, UX design, medical expertise with ordinary people as patients, etc). By our powers combined we were able to build something usable by non-rationalist-but-still-kinda-quantitative folks, and are on our way to something usable by "normal people" 😲.
We've been able to get a lot more scale of distribution/usage/uptake with a webapp, than if we had just released a spreadsheet & blogpost. And coauthors put everything I wrote through MANY rounds of extensive writing/copy changes to be more readable by ordinary folks. We get feedback often that we've changed someone's entire way of thinking about risks and probabilities. This has surprised and delighted me. And I think the explicit synthesis between rationalist and non-rationalist perspectives on the team has been directly helpful.
Also, don't forget to factor in "kicking off a chain of onwards infections" into your COVID avoidance price somehow. You can't stop at valuing "cost of COVID to *me*".
We don't really know how to do this properly yet, but see discussion here: https://forum.effectivealtruism.org/posts/MACKemu3CJw7hcJcN/microcovid-org-a-tool-to-estimate-covid-risk-from-common?commentId=v4mEAeehi4d6qXSHo#No5yn8nves7ncpmMt
Sadly nothing useful. As mentioned here (https://www.microcovid.org/paper/2-riskiness#fn6) we think it's not higher than 10%, but we haven't found anything to bound it further.
"I've heard people make this claim before but without explaining why. [...] the key risk factors for a dining establishment are indoor vs. outdoor, and crowded vs. spaced. The type of liquor license the place has doesn't matter."
I think you're misunderstanding how the calculator works. All the saved scenarios do is fill in the parameters below. The only substantial difference between "restaurant" and "bar" is that we assume bars are places people speak loudly. That's all. If the bar you have in mind isn't like that, just change the parameters.
entry-level leadership
It has become really salient to me recently that good practice involves lots of prolific output in low-stakes throwaway contexts. Whereas a core piece of EA and rationalist mindsets is steering towards high-stakes things to work on, and treating your outputs as potentially very impactful and not to be thrown away. In my own mind “practice mindset” and “impact mindset” feel very directly in tension.
I have a feeling that something around this mindset difference is part of why world-saving orientation in a community might be correlated with inadequate opportunities for low-stakes leadership practice.
Here's another further-afield steelman, inspired by blameless postmortem culture.
When debriefing / investigating a bad outcome, it's better for participants to expect not to be labeled as "bad people" (implicitly or explicitly) as a result of coming forward with information about choices they made that contributed to the failure.
More social pressure against admitting publicly that one is contributing poorly contributes to systematic hiding/obfuscation of information about why people are making those choices (e.g. incentives). And we need all that information to be out in the clear (or at least available to investigators who are committed & empowered to solve the systemic issues), if we are going to have any chance of making lasting changes.
In general, I'm curious what Zvi and Ben think about the interaction between "I expect people to yell at me if I say I'm doing this" and promoting/enabling "honest accounting".
Another distinction I think is important, for the specific example of "scientific fraud vs. cow suffering" as a hypothetical:
Science is a terrible career for almost any goal other than actually contributing to the scientific endeavor.
I have a guess that "science, specifically" as a career-with-harmful-impacts in the hypothetical was not specifically important to Ray, but that it was very important to Ben. And that if the example career in Ray's "which harm is highest priority?" thought experiment had been "high-frequency-trading" (or something else that some folks believe has harms when ordinarily practiced, but is lucrative and thus could have benefits worth staying for, and is not specifically a role of stewardship over our communal epistemics) that Ben would have a different response. I'm curious to what extent that's true.
One distinction I see getting elided here:
I think one's limited resources (time, money, etc) are a relevant question in one's behavior, but a "goodness budget" is not relevant at all.
For example: In a world where you could pay $50 to the electric company to convert all your electricity to renewables, or pay $50 more to switch from factory to pasture-raised beef, then if someone asks "hey, your household electrical bill is destroying the environment, why didn't you choose the green option", a relevant reply is "because I already spent my $50 on cow suffering".
However, if both options cost $0, then "but I already switched to pasture-raised beef" is just irrelevant in its entirety.
The recent EA meta fund announcement linked to this post (https://www.centreforeffectivealtruism.org/blog/the-fidelity-model-of-spreading-ideas ) which highlights another parallel approach: in addition to picking idea expressions that fail gracefully, to prefer transmission methods that preserve nuance.
Nah, it's purely a formatting error - the trailing parenthesis was included in the link erroneously. Added whitespace to fix now.
If you have ovaries/uterus, a non-zero interest in having kids with your own gametes, and you're at least 25 or so: Get a fertility consultation.
They do an ultrasound and a blood test to estimate your ovarian reserve. Until you either try to conceive or get other measurements, you don't know if you have normal fertility for your age, or if your fertility is already declining without knowing it.
This is important information to know, in order to make later informed decisions (such as when and whether to freeze your eggs, when to start looking for a child-raising partner, when you need to decide by before it's too late, etc.)
(I wrote more about this here: https://paper.dropbox.com/doc/Egg-freezing-catherios-info-for-friends--AbyB0V0bRUZsCM~QbeEzkNuMAg-tI98uI9kmLOlLRRuO80Zh )
Two observations:
- I'd expect that most "AI capabilities research" that goes on today isn't meaningfully moving us towards AGI at all, let alone aligned AGI. For example, applying reinforcement learning to hospital data. So "how much $ went to AI in 2018" would be a sloppy upper bound on "important thoughts/ideas/tools on the path to AGI".
- There's a lot of non-capabilities non-AGI research targeted at "making the thing better for humanity, not more powerful". For example, interpretability work on models simpler than convnets, or removing bias from word embeddings. If by "AI safety" you mean "technical AGI alignment" or "reducing x-risk from advanced AI" this category definitely isn't that, but it also definitely isn't "AI capabilities" let alone "AGI capabilities".
Important updates to your model:
- OpenAI recently hired Chris Olah (and his collaborator Ludwig Schubert), so *interpretability* is going to be a major and increasing focus at that org (not just deep RL). This is an important upcoming shift to have on your radar.
- DeepMind has at least two groups doing safety-related research: the one we know of as "safety" is more properly the "Technical AGI Safety" team, but there is also a "Safe and Robust AI team" that does more like neural net verification and adversarial examples.
- RE "General AI work in industry" - I've increasingly become aware of a number of somewhat-junior researchers who do work in a safety-relevant area (learning from human preferences, interpretability, robustness, safe exploration, verification, adversarial examples, etc.), and who are indeed long-term-motivated (determined once we say the right shibboleths at each other) but aren't on a "safety team". This gives me more evidence that if you're able to get a job anywhere within Brain or DeepMind (or honestly any other industry research lab), you can probably hill-climb your way to relevant mentorship and start doing relevant stuff.
Less important notes:
- I'm at Google Brain right now, not OpenAI!
- I wrote up a guide which I hope is moderately helpful in terms of what exactly one might do if one is interested in this path: https://80000hours.org/articles/ml-engineering-career-transition-guide/
- Here's a link for the CHAI research engineering post: https://humancompatible.ai/jobs#engineer
Our collective total years of experience is ~119 times the age of the universe. (The universe is 13.8 billion years old, versus 1.65 trillion total human experience years so far).
Also: at 7.44 billion people alive right now, we collectively experience the age of the universe every ~2 years (https://twitter.com/karpathy/status/850772106870640640?lang=en)
Can ... I set moderation norms, or not?
I hadn't read that link on the side-taking hypothesis of morality before, but I note that if you find that argument interesting, you would like Gillian Hadfield's book "Rules for a Flat World". She talks about law (not "what courts and congress do" but broadly "the enterprise of subjecting human conduct to rules") and emphasizes that law is similar to norms/morality, except in addition there is a canonical place that "the rules" get posted and also a canonical way to obtain a final arbitration about questions of "did person X break the rule?". She emphasizes that these properties enable third-party enforcement of rules with much less assumption of personal risk (because otherwise, if there's no final arbitration about whether a rule got broken, someone might punish *me* for punishing the rule-breaker). While other primates have altruism and even norms, they do not appear to have third-party enforcement. Anyway, consider this a book recommendation.
I'm a little perplexed about what you find horrifying about the side-taking hypothesis. In my view, the whole point of everything is basically to assemble the largest possible coalition of as many beings as we can possibly coordinate, using the best possible coordination mechanisms we collectively have access to, so that as many as possible of us can play this game and have a good time playing it for as long as we can. Of course we need to protect that coalition and defend it from its enemies, because there will always be enemies. But hopefully we can make there be fewer of them so that more of us can play.
If that's the whole point of everything, then a system in which we can constantly make coordinated decisions about which side is "the big coalition of all of us" and keep the number of enemies to a minimum seems like *fantastic* technology and I want us all to be using it.
As a side note, I saw recently somewhere in the blogsphere a discussion about whether the development of human intelligence was fueled by advantages in creating laws (versus "breaking laws" or "some other reason"), but I don't recall where that was and I would appreciate a reference if someone has one. The basic idea was that laws and morality both require a kind of abstract thinking - logical quantifiers like "for all people with property X" and "Y is allowed only if Z" - which, lo and behold, homo sapiens seems to have evolved for some reason, and that reason might've been to reason abstractly about social rules. (Indeed, people are much better at the Wason card-flipping task when policing a social rule rather than deducing abstract properties).
FWIW, this claim doesn't match my intuition, and googling around, I wasn't able to quickly find any papers or blog posts supporting it.
"Explaining and Harnessing Adversarial Examples" (Goodfellow et al. 2014) is the original demonstration that "Linear behavior in high-dimensional spaces is sufficient to cause adversarial examples".
I'll emphasize that high-dimensionality is a crucial piece of the puzzle, which I haven't seen you bring up yet. You may already be aware of this, but I'll emphasize it anyway: the usual intuitions do not even remotely apply in high-dimensional spaces. Check out Counterintuitive Properties of High Dimensional Space.
adversarial examples are only a thing because the wrong decision boundary has been learned
In my opinion, this is spot-on - not only your claim that there would be no adversarial examples if the decision boundary were perfect, but in fact a group of researchers are beginning to think that in a broader sense "adversarial vulnerability" and "amount of test set error" are inextricably linked in a deep and foundational way - that they may not even be two separate problems. Here are a few citations that point at some pieces of this case:
- "Adversarial Spheres" (Gilmer et al. 2017) - "For this dataset we show a fundamental tradeoff between the amount of test error and the average distance to nearest error. In particular, we prove that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/√d)." (emphasis mine)
- I think this paper is truly fantastic in many respects.
- The central argument can be understood from the intuitions presented in Counterintuitive Properties of High Dimensional Space in the section titled Concentration of Measure (Figure 9). Where it says "As the dimension increases, the width of the band necessary to capture 99% of the surface area decreases rapidly." you can just replace that with the "As the dimension increases, a decision-boundary hyperplane that has 1% test error rapidly gets extremely close to the equator of the sphere". "Small distance from the center of the sphere" is what gives rise to "Small epsilon at which you can find an adversarial example".
- "Intriguing Properties of Adversarial Examples" (Cubuk et al. 2017) - "While adversarial accuracy is strongly correlated with clean accuracy, it is only weakly correlated with model size"
- I haven't read this paper, but I've heard good things about it.
To summarize, my belief is that any model that is trying to learn a decision boundary in a high-dimensional space, and is basically built out of linear units with some nonlinearities, will be susceptible to small-perturbation adversarial examples so long as it makes any errors at all.
(As a note - not trying to be snarky, just trying to be genuinely helpful, Cubuk et al. 2017 and Goodfellow et al. 2014 are my top two hits for "adversarial examples linearity" in an incognito tab)
When evaluating whether there is a broad base of support, I think it's important to distinguish "one large-scale funder" from "narrow overall base of support". Before the Arnold foundation's funding, the reproducibility project had a broad base of committed participants contributing their personal resources and volunteering their time.
To add some details from personal experience: In late 2011 and early 2012, the Reproducibility Project was a great big underfunded labor of love. Brian Nosek had outlined a plan to replicate ~50 studies - this became the Science 2015 paper. He was putting together spreadsheets to coordinate everyone, and hundreds of researchers who were personally committed to the cause were allocating their own discretionary funds and working in their spare time to get the replications done in their own labs. The mailing list was thriving. Researchers were paying subjects out-of-pocket. Reproducibility wasn't a full-blown memetic explosion in the public eye, nor was there a major source of funding, but we were getting notable media coverage, and researchers kept joining.
Importantly, I think we were already firmly on track to write the 2015 Science paper before the Arnold Foundation took notice of the coverage that existing projects were getting and began reaching out to Nosek and others to ask if they could do more with more funding.
When the Center for Open Science was founded, it increased the scale of coordination that Brian and other coordinators were able to execute amongst participants. I'd guess that Brian himself was also able to spend more time talking to the media. The base of participating researchers remained broad and unpaid. I'd guess that the vast majority of researchers contributing personally to the reproducibility movement are still not getting any earmarked funds for it.
I wasn't aware of the details of COS's funding before reading this article, so I have no additional evidence about whether there are more large-scale funders. A brief round of Googling turns up a few other Open Science flavored sources of money (e.g. https://www.openscienceprize.org/res/p/FAQ/) but these are not specific to reproducibility; rather, they're more broadly targeted towards open sharing of code, data, and methods.
A few suggested takeaways:
There may be other cases where an existing movement with an enthusiastic base of participants is funding-limited in the scale of coordination and publicity they can achieve, and a single motivated funder can make a substantial impact by adding that type of funding.
Under "normal" funding and incentive conditions, the reproducibility project was able to form and begin producing concrete and impactful output, but thereafter it appears that only one major source of funding materialized and no other dedicated large-scale funding has been available. I think this should make you feel optimistic about academic researchers as individuals and as a culture, but pessimistic about traditional academic funding routes, rather than monolithically pessimistic about academia as a whole.