Whistleblowing Twitter Bot

post by Mckiev · 2024-12-26T04:09:45.493Z · LW · GW · 5 comments

Contents

5 comments

In this post, I propose an idea that could improve whistleblowing efficiency, thus hopefully improving AI Safety by making unsafe practices discovered marginally faster.

I'm looking for feedback, ideas for improvement, and people interested in making it happen.

It has been proposed [EA · GW] before, that it's beneficial to have an efficient and trustworthy whistleblowing mechanism The technology that makes it possible has become easy and convenient. For example, here is Proof of Organization, built on top of ZK Email: a message board that allows people owning an email address at their company's domain to post without revealing their identity And here is an application for ring signatures using GitHub SSH keys that allows creating a signature that proves that you own one of the keys from any subgroup you define (e.g., EvilCorp repository contributors)

However, as one may have guessed, it hasn't been widely used. Hence, when the critical moment arrives, the whistleblower may not be aware of such technology, and even if they were, they probably wouldn't trust it enough to use it. I think trust comes from either code being audited by a well-established and trusted entity or, more commonly - through practice (e.g., I don't need to verify that a certain password manager is secure if I know that millions are using it, and there haven't been any password breaches reported)

Hence, I was considering how to make a privacy-preserving communication tool that would be commonly used, demonstrating its legitimacy and becoming trusted

The best idea I have so far is to create a set of Twitter bots for each interesting company (or community), where only the people in question could post. Depending on the particular Twitter bot in question, access could be gated by ownership of a LinkedIn account, email domain, or, e.g., an LW/AI-Alignment forum account of a certain age

I imagine this could become viral and interesting in gossipy cases, like the Sam Altman drama or the Biden dropout drama.

Some questions that came up during consideration:

I'm curious to learn what others think and about other ideas for making a gossip/whistleblower tool that could become widely known and trusted.

5 comments

Comments sorted by top scores.

comment by ChristianKl · 2024-12-27T12:31:47.552Z · LW(p) · GW(p)

I think it would be good to automate the moderation process. Current LLM should be able to make the decision about whether a post is containing the kind of profanity that would lead to account bans.

Replies from: lunatic_at_large
comment by lunatic_at_large · 2024-12-27T16:09:31.625Z · LW(p) · GW(p)

I agree, though I think it would be a very ridiculous own-goal if e.g. GPT-4o decided to block a whistleblowing report about OpenAI because it was trained to serve OpenAI's interests. I think any model used by this kind of whistleblowing tool should be open-source (nothing fancy / more dangerous than what's already out there), run locally by the operators of the tool, and tested to make sure it doesn't block legitimate posts. 

Replies from: Mckiev
comment by Mckiev · 2024-12-27T23:24:41.031Z · LW(p) · GW(p)

I can also unblock it manually at any point, and keep the full uncensored log of posts on a blockchain

comment by lunatic_at_large · 2024-12-27T16:01:22.197Z · LW(p) · GW(p)

My gut instinct is that this would have been a fantastic thing to create 2-4 years ago. My biggest hesitation is that the probability a tool like this decreases existential risk is proportional to the fraction of lab researchers who know about it and adoption can be a slow / hard thing to make happen. I still think that this kind of program could be incredibly valuable under the right circumstances so someone should probably be working on this.

Also, I have a very amateurish security question: if someone provides their work email to verify their authenticity with this tool, can their employer find out? For example, I wouldn't put it past OpenAI to check if an employee's email account got pinged by this tool and then to pressure / fire that employee. 

Replies from: Mckiev
comment by Mckiev · 2024-12-27T23:23:30.746Z · LW(p) · GW(p)

Thanks for sharing your opinion. Regarding security: Using a full body of an email you can generate a zero knowledge using an offline tool (since all emails are hashed and signed by the email server). No new emails need to be exchanged