Posts

Unsolved research problems vs. real-world threat models 2019-03-26T22:10:08.371Z · score: 19 (7 votes)

Comments

Comment by catherio on No, it's not The Incentives—it's you · 2019-06-19T23:41:34.186Z · score: 20 (8 votes) · LW · GW

Here's another further-afield steelman, inspired by blameless postmortem culture.

When debriefing / investigating a bad outcome, it's better for participants to expect not to be labeled as "bad people" (implicitly or explicitly) as a result of coming forward with information about choices they made that contributed to the failure.

More social pressure against admitting publicly that one is contributing poorly contributes to systematic hiding/obfuscation of information about why people are making those choices (e.g. incentives). And we need all that information to be out in the clear (or at least available to investigators who are committed & empowered to solve the systemic issues), if we are going to have any chance of making lasting changes.


In general, I'm curious what Zvi and Ben think about the interaction between "I expect people to yell at me if I say I'm doing this" and promoting/enabling "honest accounting".

Comment by catherio on No, it's not The Incentives—it's you · 2019-06-14T07:48:31.968Z · score: 16 (6 votes) · LW · GW

Another distinction I think is important, for the specific example of "scientific fraud vs. cow suffering" as a hypothetical:

Science is a terrible career for almost any goal other than actually contributing to the scientific endeavor.

I have a guess that "science, specifically" as a career-with-harmful-impacts in the hypothetical was not specifically important to Ray, but that it was very important to Ben. And that if the example career in Ray's "which harm is highest priority?" thought experiment had been "high-frequency-trading" (or something else that some folks believe has harms when ordinarily practiced, but is lucrative and thus could have benefits worth staying for, and is not specifically a role of stewardship over our communal epistemics) that Ben would have a different response. I'm curious to what extent that's true.

Comment by catherio on No, it's not The Incentives—it's you · 2019-06-14T07:34:31.517Z · score: 14 (6 votes) · LW · GW

One distinction I see getting elided here:

I think one's limited resources (time, money, etc) are a relevant question in one's behavior, but a "goodness budget" is not relevant at all.

For example: In a world where you could pay $50 to the electric company to convert all your electricity to renewables, or pay $50 more to switch from factory to pasture-raised beef, then if someone asks "hey, your household electrical bill is destroying the environment, why didn't you choose the green option", a relevant reply is "because I already spent my $50 on cow suffering".

However, if both options cost $0, then "but I already switched to pasture-raised beef" is just irrelevant in its entirety.

Comment by catherio on You Have About Five Words · 2019-05-24T01:11:57.561Z · score: 4 (2 votes) · LW · GW

The recent EA meta fund announcement linked to this post (https://www.centreforeffectivealtruism.org/blog/the-fidelity-model-of-spreading-ideas ) which highlights another parallel approach: in addition to picking idea expressions that fail gracefully, to prefer transmission methods that preserve nuance.

Comment by catherio on Boring Advice Repository · 2019-04-25T01:51:57.290Z · score: 6 (3 votes) · LW · GW

Nah, it's purely a formatting error - the trailing parenthesis was included in the link erroneously. Added whitespace to fix now.

Comment by catherio on Boring Advice Repository · 2019-04-25T00:08:12.916Z · score: 19 (4 votes) · LW · GW

If you have ovaries/uterus, a non-zero interest in having kids with your own gametes, and you're at least 25 or so: Get a fertility consultation.

They do an ultrasound and a blood test to estimate your ovarian reserve. Until you either try to conceive or get other measurements, you don't know if you have normal fertility for your age, or if your fertility is already declining without knowing it.

This is important information to know, in order to make later informed decisions (such as when and whether to freeze your eggs, when to start looking for a child-raising partner, when you need to decide by before it's too late, etc.)

(I wrote more about this here: https://paper.dropbox.com/doc/Egg-freezing-catherios-info-for-friends--AbyB0V0bRUZsCM~QbeEzkNuMAg-tI98uI9kmLOlLRRuO80Zh )

Comment by catherio on How much funding and researchers were in AI, and AI Safety, in 2018? · 2019-03-12T08:30:52.967Z · score: 8 (5 votes) · LW · GW

Two observations:

  • I'd expect that most "AI capabilities research" that goes on today isn't meaningfully moving us towards AGI at all, let alone aligned AGI. For example, applying reinforcement learning to hospital data. So "how much $ went to AI in 2018" would be a sloppy upper bound on "important thoughts/ideas/tools on the path to AGI".
  • There's a lot of non-capabilities non-AGI research targeted at "making the thing better for humanity, not more powerful". For example, interpretability work on models simpler than convnets, or removing bias from word embeddings. If by "AI safety" you mean "technical AGI alignment" or "reducing x-risk from advanced AI" this category definitely isn't that, but it also definitely isn't "AI capabilities" let alone "AGI capabilities".
Comment by catherio on Current AI Safety Roles for Software Engineers · 2018-11-10T07:04:32.800Z · score: 16 (8 votes) · LW · GW

Important updates to your model:

  • OpenAI recently hired Chris Olah (and his collaborator Ludwig Schubert), so *interpretability* is going to be a major and increasing focus at that org (not just deep RL). This is an important upcoming shift to have on your radar.
  • DeepMind has at least two groups doing safety-related research: the one we know of as "safety" is more properly the "Technical AGI Safety" team, but there is also a "Safe and Robust AI team" that does more like neural net verification and adversarial examples.
  • RE "General AI work in industry" - I've increasingly become aware of a number of somewhat-junior researchers who do work in a safety-relevant area (learning from human preferences, interpretability, robustness, safe exploration, verification, adversarial examples, etc.), and who are indeed long-term-motivated (determined once we say the right shibboleths at each other) but aren't on a "safety team". This gives me more evidence that if you're able to get a job anywhere within Brain or DeepMind (or honestly any other industry research lab), you can probably hill-climb your way to relevant mentorship and start doing relevant stuff.

Less important notes:

  • I'm at Google Brain right now, not OpenAI!
  • I wrote up a guide which I hope is moderately helpful in terms of what exactly one might do if one is interested in this path: https://80000hours.org/articles/ml-engineering-career-transition-guide/
  • Here's a link for the CHAI research engineering post: https://humancompatible.ai/jobs#engineer
Comment by catherio on The funnel of human experience · 2018-10-12T00:06:30.508Z · score: 19 (7 votes) · LW · GW

Our collective total years of experience is ~119 times the age of the universe. (The universe is 13.8 billion years old, versus 1.65 trillion total human experience years so far).

Also: at 7.44 billion people alive right now, we collectively experience the age of the universe every ~2 years (https://twitter.com/karpathy/status/850772106870640640?lang=en)

Comment by catherio on [deleted post] 2018-08-07T04:32:07.442Z

Can ... I set moderation norms, or not?

Comment by catherio on Personal relationships with goodness · 2018-05-19T04:54:57.464Z · score: 21 (5 votes) · LW · GW

I hadn't read that link on the side-taking hypothesis of morality before, but I note that if you find that argument interesting, you would like Gillian Hadfield's book "Rules for a Flat World". She talks about law (not "what courts and congress do" but broadly "the enterprise of subjecting human conduct to rules") and emphasizes that law is similar to norms/morality, except in addition there is a canonical place that "the rules" get posted and also a canonical way to obtain a final arbitration about questions of "did person X break the rule?". She emphasizes that these properties enable third-party enforcement of rules with much less assumption of personal risk (because otherwise, if there's no final arbitration about whether a rule got broken, someone might punish *me* for punishing the rule-breaker). While other primates have altruism and even norms, they do not appear to have third-party enforcement. Anyway, consider this a book recommendation.

I'm a little perplexed about what you find horrifying about the side-taking hypothesis. In my view, the whole point of everything is basically to assemble the largest possible coalition of as many beings as we can possibly coordinate, using the best possible coordination mechanisms we collectively have access to, so that as many as possible of us can play this game and have a good time playing it for as long as we can. Of course we need to protect that coalition and defend it from its enemies, because there will always be enemies. But hopefully we can make there be fewer of them so that more of us can play.

If that's the whole point of everything, then a system in which we can constantly make coordinated decisions about which side is "the big coalition of all of us" and keep the number of enemies to a minimum seems like *fantastic* technology and I want us all to be using it.

As a side note, I saw recently somewhere in the blogsphere a discussion about whether the development of human intelligence was fueled by advantages in creating laws (versus "breaking laws" or "some other reason"), but I don't recall where that was and I would appreciate a reference if someone has one. The basic idea was that laws and morality both require a kind of abstract thinking - logical quantifiers like "for all people with property X" and "Y is allowed only if Z" - which, lo and behold, homo sapiens seems to have evolved for some reason, and that reason might've been to reason abstractly about social rules. (Indeed, people are much better at the Wason card-flipping task when policing a social rule rather than deducing abstract properties).

Comment by catherio on Critch on career advice for junior AI-x-risk-concerned researchers · 2018-05-19T03:21:20.381Z · score: 31 (7 votes) · LW · GW

FWIW, this claim doesn't match my intuition, and googling around, I wasn't able to quickly find any papers or blog posts supporting it.

"Explaining and Harnessing Adversarial Examples" (Goodfellow et al. 2014) is the original demonstration that "Linear behavior in high-dimensional spaces is sufficient to cause adversarial examples".

I'll emphasize that high-dimensionality is a crucial piece of the puzzle, which I haven't seen you bring up yet. You may already be aware of this, but I'll emphasize it anyway: the usual intuitions do not even remotely apply in high-dimensional spaces. Check out Counterintuitive Properties of High Dimensional Space.

adversarial examples are only a thing because the wrong decision boundary has been learned

In my opinion, this is spot-on - not only your claim that there would be no adversarial examples if the decision boundary were perfect, but in fact a group of researchers are beginning to think that in a broader sense "adversarial vulnerability" and "amount of test set error" are inextricably linked in a deep and foundational way - that they may not even be two separate problems. Here are a few citations that point at some pieces of this case:

  • "Adversarial Spheres" (Gilmer et al. 2017) - "For this dataset we show a fundamental tradeoff between the amount of test error and the average distance to nearest error. In particular, we prove that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/√d)." (emphasis mine)
    • I think this paper is truly fantastic in many respects.
    • The central argument can be understood from the intuitions presented in Counterintuitive Properties of High Dimensional Space in the section titled Concentration of Measure (Figure 9). Where it says "As the dimension increases, the width of the band necessary to capture 99% of the surface area decreases rapidly." you can just replace that with the "As the dimension increases, a decision-boundary hyperplane that has 1% test error rapidly gets extremely close to the equator of the sphere". "Small distance from the center of the sphere" is what gives rise to "Small epsilon at which you can find an adversarial example".
  • "Intriguing Properties of Adversarial Examples" (Cubuk et al. 2017) - "While adversarial accuracy is strongly correlated with clean accuracy, it is only weakly correlated with model size"
    • I haven't read this paper, but I've heard good things about it.

To summarize, my belief is that any model that is trying to learn a decision boundary in a high-dimensional space, and is basically built out of linear units with some nonlinearities, will be susceptible to small-perturbation adversarial examples so long as it makes any errors at all.

(As a note - not trying to be snarky, just trying to be genuinely helpful, Cubuk et al. 2017 and Goodfellow et al. 2014 are my top two hits for "adversarial examples linearity" in an incognito tab)

Comment by catherio on Funding the Reproducibility Crises as effective giving · 2017-01-27T06:33:30.613Z · score: 14 (7 votes) · LW · GW

When evaluating whether there is a broad base of support, I think it's important to distinguish "one large-scale funder" from "narrow overall base of support". Before the Arnold foundation's funding, the reproducibility project had a broad base of committed participants contributing their personal resources and volunteering their time.

To add some details from personal experience: In late 2011 and early 2012, the Reproducibility Project was a great big underfunded labor of love. Brian Nosek had outlined a plan to replicate ~50 studies - this became the Science 2015 paper. He was putting together spreadsheets to coordinate everyone, and hundreds of researchers who were personally committed to the cause were allocating their own discretionary funds and working in their spare time to get the replications done in their own labs. The mailing list was thriving. Researchers were paying subjects out-of-pocket. Reproducibility wasn't a full-blown memetic explosion in the public eye, nor was there a major source of funding, but we were getting notable media coverage, and researchers kept joining.

Importantly, I think we were already firmly on track to write the 2015 Science paper before the Arnold Foundation took notice of the coverage that existing projects were getting and began reaching out to Nosek and others to ask if they could do more with more funding.

When the Center for Open Science was founded, it increased the scale of coordination that Brian and other coordinators were able to execute amongst participants. I'd guess that Brian himself was also able to spend more time talking to the media. The base of participating researchers remained broad and unpaid. I'd guess that the vast majority of researchers contributing personally to the reproducibility movement are still not getting any earmarked funds for it.

I wasn't aware of the details of COS's funding before reading this article, so I have no additional evidence about whether there are more large-scale funders. A brief round of Googling turns up a few other Open Science flavored sources of money (e.g. https://www.openscienceprize.org/res/p/FAQ/) but these are not specific to reproducibility; rather, they're more broadly targeted towards open sharing of code, data, and methods.

A few suggested takeaways:

  • There may be other cases where an existing movement with an enthusiastic base of participants is funding-limited in the scale of coordination and publicity they can achieve, and a single motivated funder can make a substantial impact by adding that type of funding.

  • Under "normal" funding and incentive conditions, the reproducibility project was able to form and begin producing concrete and impactful output, but thereafter it appears that only one major source of funding materialized and no other dedicated large-scale funding has been available. I think this should make you feel optimistic about academic researchers as individuals and as a culture, but pessimistic about traditional academic funding routes, rather than monolithically pessimistic about academia as a whole.