If I encounter a capabilities paper that kinda spooks me, what should I do with it?
post by the gears to ascension (lahwran) · 2023-02-03T21:37:36.689Z · LW · GW · 3 commentsThis is a question post.
Contents
Answers 12 Nathan Helm-Burger 4 NicholasKees 4 Seth Herd 1 mako yass 1 Alexander Gietelink Oldenziel None 3 comments
If I encounter a capabilities paper that kinda spooks me, what should I do with it? I'm inclined to share it as a draft post with some people I think should know about it. I have encountered such a paper, and I found it in a capabilities discussion group who will have no hesitation about using it to try to accumulate power for themselves, in denial about any negative effects it could have. It runs on individual computers.
edit for capabilities folks finding this later: it was a small potatoes individual paper again lol. but the approach is one I still think very promising.
Answers
There is an organizational structure in the process of being developed explicitly for handling this. In the meantime please reach out to the EA community health team attn: 'AGI risk landscape watch team'. https://docs.google.com/forms/d/e/1FAIpQLScJooJD0Sm2csCYgd0Is6FkpyQa3ket8IIcFzd_FcTRU7avRg/viewform
(I've been talking to the people involved and can assure you that I believe them to be both trustworthy and competent.)
I think it's really important for everyone to always have a trusted confidant, and to go to them directly with this sort of thing first before doing anything. It is in fact a really tough question, and no one will be good at thinking about this on their own. Also, for situations that might breed a unilateralist's curse type of thing, strongly err on the side of NOT DOING ANYTHING.
I'd say share it here on LessWrong in comments on relevant articles as well as in a post. I'd say that maximizes the upside of safety oriented people knowing about it, while minimally helping to popularize it among capabilities groups.
Email it to Anthropic?
And I guess they shouldn't say too much in return, but they should at least indicate that they understand it. If they don't, send it to the next friendliest practical alignment researchers, and if they don't signal understanding either, and if have to go far enough down on your list that telling the next one on the list would no longer be a net positive act, then you'll have a major community issue to yell about.
Currently there seems to be no good way to deal with this conundrum.
One wonders whether the AI alignment community should setup some sort of encrypted partial-sharing partial-transparancy protocol for these kinds of situations.
3 comments
Comments sorted by top scores.
comment by ChristianKl · 2023-02-03T23:12:37.464Z · LW(p) · GW(p)
Do you expect that there are papers that spook you but that wouldn't get attention if you don't tell other people about it?
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2023-02-03T23:13:13.676Z · LW(p) · GW(p)
attention isn't binary. Giving a paper more attention because I think it is very powerful could still be a spark that gets it to spread much faster, if the folks who've seen it don't realize how powerful it is yet. This is extremely common; mere combinations of papers are often enough for the abstract of the obvious followup to nearly write itself in the heads of competent researchers. In general, the most competent capabilities researchers do not announce paper lists from the rooftops for this reason - they try to build the followup and then they announce that. In general I don't think I am being watched by many high competence folks, and the ones who are, probably simply explore the internet manually the same way I do. But it's something that I always have in mind, and occasionally I see a paper that really raises my hackles.
Replies from: awg↑ comment by awg · 2023-02-03T23:18:24.944Z · LW(p) · GW(p)
If it truly raises your hackles then maybe it's worth sharing with at least one or two people who are working in safety research directly? Spreading it by ones and twos amongst people who would use the information for good (as it were) doesn't seem too dangerous to me.