Desperately looking for the right person to discuss an alignment related idea with. (and some general thoughts for others with similar problems)

post by Rasmus Eide (rasmus-eide) · 2020-10-23T19:42:41.567Z · LW · GW · 3 comments

This is a question post.


    Radford Neal

I have no idea how to start this post. I don't even have the resources (mostly health-like ones lacking) to figure out the meta-strategy to reach out for help, and this is like the billionth place I've tried these last weeks since things became to urgent to do things the right way. My instincts and social anxiety are screaming at me that I am burning bridges and am going to be downvoted and hated and booed by everyone I consider high status for posting this mess for the rest of eternity but no matter how I run the numbers over and over again to try to find an excuse not to post this they still come out saying I have to write this even if it ruins my life. So here goes nothing please don't downvote. 

There's a topic that really deserves a long article that is beyond my resources to create but I'd probably call something like "The expert, the hobbyist, and the 998 crackpots", the idea being a simplified version of the real scenario that will probably happen at some point and might be happening right here. The idea is basically that a hobbyist has had a critical insight, and needs to get it to the expert, and the expert desperately needs that insight, and everyone knows  this, but there are also 998 Crackpots that each have their own useless garbage insight and the expert needs to spend a significant amount of time to determine if a given contact is the hobbyist or one of the crackpots, and due to the nature of crackpots and impostor syndrome, the hobbyist or crackpot cannot tell from the inside which one it is they are. As such, even though each one (me) knows they are far more likely to be a crackpot than the real hobbyist they all need to transmit the insight they have and treat it as potentially important, and all these players together needs to find a way for the expert to find the useful needle insight in the 999 haystack.

Now, there is a partial solution we've been using; this website, with community vetted articles and votes to filter out and iterate on ideas and the experts only reading the ones filtered through this process, but it fails in two important cases that both apply here (and potentially others that I haven't thought of):

One, the hobbyist ends up being someone like me; someone who can't write articles to the needed standard, doesn't have the psychology needed to participate in the social context, and is simply to ill and lacking capabilities to get their ideas into the system. This might seem unlikely, but remember that often the root of a new insight does not require competence or knowledge, it merely requires novelty and thinking in a different way and out of he box compared to everyone else, and this kind of thing correlates with mental illness.

Secondly, it fails if the idea isn't known to be safe, either because there's a specific possible danger, or because the person who had the original insight does not have the skills required to evaluate the degree of danger the idea may or may not pose. Arguably, some very broad reference classes of ideas have that last issue if the person who has them doesn't happen to be an expert in the field already, and we've already over the years had a few close shaves with ideas that turned out to have small amounts of danger and would have been better of having never been published, but the person who originally had them didn't realize that danger. In my opinion it seems necessary that if you have an idea that you suspect might possibly be dangerous, and you don't know how to evaluate the degree of danger, you have to treat it as if potentially apocalyptically dangerous, even if you don't see how it could be and assign an extremely small probability that it is. 

I don't know what the solution might be for this, but one obvious patch that comes to mind is the expert delegating the task of talking to each crackpot one at a time in emails or something to one of the many people that seem to exist they trust personally and have a moderate knowledge of the safety relevant topics but who's time is less valuable than the people who work directly on important things. But I am not aware such a person exists, so if they do it needs to be more clearly advertised. 

So... please help, what should I do?


answer by Radford Neal · 2020-10-24T01:33:04.182Z · LW(p) · GW(p)

I think you should share it with one other person.  Almost any other person, as long as you have reason to think they are knowledgeable enough to understand it, and that they will take you seriously enough to listen to the idea.  Since you're talking about a (small) chance of billions of deaths, they don't have to be ethical paragons - a very high fraction of people will not want that to happen, even if they're otherwise rather awful people.  

This person will then be in a position to give you more useful advice, starting with whether or not your worries are at all rational.  (If they're not, and are instead a sign of illness, then I guess it would be good if the person was not too unkind...)

comment by Rasmus Eide (rasmus-eide) · 2020-10-24T16:54:05.952Z · LW(p) · GW(p)

That was the plan for from the very beginning, but I don't know anyone like that IRL, and didn't manage to get in contact with anyone over email or discord after trying several times. Now I did thou, so it's what I'm doing right now. 

answer by interstice · 2020-10-24T03:50:00.893Z · LW(p) · GW(p)

This might seem unlikely, but remember that often the root of a new insight does not require competence or knowledge, it merely requires novelty and thinking in a different way and out of he box compared to everyone else, and this kind of thing correlates with mental illness.

I think the history of most intellectual disciplines doesn't really reflect this; most good new ideas are produced by people with lots of familiarity with their chosen area, and build extensively on ideas that already exist. Usually, 'ability to think novel thoughts' is not the limiting factor.

That said, if you want someone to talk about your idea with in private, maybe you should try direct messaging someone who you consider knowledgeable, but doesn't seem likely to be super busy. e.g. someone with insightful LW posts, but maybe doesn't work for MIRI. I'd be willing to.

comment by Rasmus Eide (rasmus-eide) · 2020-10-24T16:55:48.252Z · LW(p) · GW(p)

Agreed, about "most intellectual disciplines", and even more so when it comes to something like art, game design, or startup entrepreneurs. However, I think AI risk is one of the exceptions to this rule, quite strongly. 

answer by Dagon · 2020-10-23T22:02:09.847Z · LW(p) · GW(p)

[aside: I'd drop the first paragraph.  I very nearly stopped reading and downvoted based on the fact that you expected me to. ]

You'd perhaps be surprised just how commonly even rare or unusual ideas occur.  We're nearing 9 billion concurrent humans, at least half of them can generate ideas, and probably at least 1% can generate plausibly-good ideas.   That said, idea-space is BIG (cue Douglas Adams quote), so it's believable that this one is the exception.  But that also explains why most ideas from crackpots, many from hobbyists, and some from experts are garbage.  Ideas are also, as you say, expensive to validate, so a lot just get lost until re-discovered by someone who can make use of them.

I don't think you have any easy path but to either sit on it or publish it.  Or invest enough (if you can) to become a hobbyist-expert yourself, in order to actually evaluate it, and to identify and gain trust with the people who can make a proper evaluation.  That's not an option for most people, though.

My prior is that the VAST majority of non-obvious ideas are harmless, and I'm constitutionally unable to keep quiet about things I think are interesting, so I'd just post it here (as shortform if it's really raw) and see if it gets any traction.

comment by Rasmus Eide (rasmus-eide) · 2020-10-24T00:49:41.662Z · LW(p) · GW(p)

Do I really come of as such a complete idiot that these things wouldn't be obvious and already accounted for? I have a billion ideas, some of which I've been sitting on for decades hoping for my health to get better. I'm already a "hobbyist-expert" and have spent most of my life on these questions, but I due to chronic illness and not being a one-in-a-million genius I probably won't ever be able to work professionally.

I wouldn't have posted this here if it wasn't literally a billions-of-live-on-the-line situation, and another week of waiting and trying to express it better might mean disaster. Doing this this way is extremely painful and humiliating to me, but I have tried everything else and I can't see any way to avoid it that's ethically defensible. I am extremely disappointed and frightened by your uncharitable reading and my every instinct is screaming at me to delete the post but I can't. People like you are why this took weeks and almost didn't get posted, causing severe harm and risk, and your behavior is extremely irresponsible and dangerous. 

Your prior about non-obvious ideas being harmless has huge amounts of evidence against it, both this site's history and the fact all experts and important organization seem to take the risk quite seriously. And I can see several ways my idea in particular could cause significant harm, even if they are on the whole unlikely. 


Comments sorted by top scores.

comment by Ben Pace (Benito) · 2020-10-24T06:19:41.057Z · LW(p) · GW(p)

PM'd, to chat about it more.

Replies from: Benito, rasmus-eide
comment by Ben Pace (Benito) · 2020-11-09T19:52:50.633Z · LW(p) · GW(p)

I exchanged about a thousand words of text with Rasmus (and he back), but I wasn't able to help much, and I don't fully understand his idea. But I don't think his idea is dangerous, I don't think he should be acting as though it needs to be kept secret, and I think it would be fine for a shortform post.

comment by Rasmus Eide (rasmus-eide) · 2020-10-24T16:56:35.714Z · LW(p) · GW(p)

Thanks, this is exactly what I were hoping for!