Conjecture: Internal Infohazard Policy

post by Connor Leahy (NPCollapse), Sid Black (sid-black), Chris Scammell (chris-scammell), Andrea_Miotti (AndreaM) · 2022-07-29T19:07:08.491Z · LW · GW · 6 comments

Contents

    Overview and Motivation
    Considerations
  The Policy 
    Introduction
    Rules
      Secret
      Private
      Public
    Processes
      1. Assigning Disclosure Levels
      2. Sharing Information
      3. Policy Violation Process
      4. Information Security and Storage
      5. Quarterly Policy Review
  Additional Considerations
    Example Scenarios
    Best Practices
    Psychological Safety
None
6 comments

This post benefited from feedback and comments from the whole Conjecture team, as well as others including Steve Byrnes, Paul Christiano, Leo Gao, Evan Hubinger, Daniel Kokotajlo, Vanessa Kosoy, John Wentworth, Eliezer Yudkowsky. Many others also kindly shared their feedback and thoughts on it formally or informally, and we are thankful for everyone's help on this work.

Much has been written on this forum about infohazards, such as information that accelerates AGI timelines, though very few posts attempt to operationalize that discussion into a policy that can be followed by organizations and individuals. This post makes a stab at implementation.

Below we share Conjecture’s internal infohazard policy as well as some considerations that we took into account while drafting it. Our goal with sharing this on this forum is threefold: 

  1. To encourage accountability. We think that organizations working on artificial intelligence - particularly those training and experimenting on large models - need to be extremely cautious about advancing capabilities and accelerating timelines. Adopting internal policies to mitigate the risk of leaking dangerous information is essential, and being public about those policies signals commitment to this idea. I.e., shame on us if we break this principle.
     
  2. To promote cross-organization collaboration. While secrecy can hurt productivity, we believe that organizations will be able to work more confidently with each other if they follow similar infohazard policies. Two parties can speak more freely when they mutually acknowledge what information is sharable and to whom it can be shared, and when both show serious dedication to good information security. A policy that formalizes this means that organizations and individuals don’t need to reinvent norms for trust each time they interact.

    Note that at the current level of implementation, mutual trust relies mostly on the consequence of "if you leak agreed-upon secrets your reputation is forever tarnished.” But since alignment is a small field, this seems to carry sufficient weight at current scale.
     
  3. To start a conversation that leads to better policies. This policy is not perfect, reviewers disagreed on some of the content or presentation, and it is guaranteed that better versions of this can be made. We hope that in its imperfection, this policy can act as a seed from which better policies and approaches to handling infohazards can grow. Please share your feedback!

Overview and Motivation

“Infohazard” is underspecified and has been used to mean both “information that directly harms the hearer such that you would rather not hear it” and “information that increases the likelihood of collective destruction if it spreads or falls into the wrong hands.”

At Conjecture the kind of infohazard that we care about are those that accelerate AGI timelines, i.e., capabilities of companies, teams, or people without restraint. Due to the nature of alignment work at Conjecture it is assured that some employees will work on projects that are infohazardous in nature, as insights about how to increase the capabilities of AI systems can arise while investigating alignment research directions. We have implemented a policy to create norms that can protect this kind of information from spreading.


The TL;DR of the policy is: Mark all internal projects as explicitly secret, private, or public. Only share secret projects with selected individuals; only share private projects with selected groups; share public projects with anyone, but use discretion. When in doubt consult the project leader or the “appointed infohazard coordinator”.


We need an internal policy like this because trust does not scale: the more people who are involved in a secret, the harder it is to keep. If there is a probability of 99% / 95% / 90% that anyone keeps all Conjecture-related infohazard secrets, the probability of 30 people doing so drops to 74% / 21% / 4%. This implies that if you share secrets with everyone in the company, they will leak out.

Our policy leans conservative because leaking infohazardous information could lead to catastrophic outcomes. In general, reducing the spread of infohazards means more than just keeping them away from companies or people that could understand and deploy them. It means keeping them away from anyone, since sharing information with someone increases the opportunities it has to spread.

Considerations

An infohazard policy needs to strike the right balance between what should and what should not be disclosed, and to whom. The following are a number of high level considerations that we took into account when writing our policy: 

In other words, we need to balance many different considerations, not merely whether “it is an infohazard or not”.

The Policy 

(Verbatim from Conjecture’s internal document.) 

Introduction

This document is for Conjecture and includes employees, interns, and collaborators. Note that this policy is not retroactive; any past discussions on this subject have been informal. 

This policy applies only to direct infohazards related to AGI Capabilities. To be completely clear: this is about infohazards, not PR hazards, reputational hazards, etc.; and this is about AGI capabilities.

Examples of presumptive infohazards:

  1. Leaking code that trains networks faster
  2. Leaking a new technique that trains networks faster
  3. Leaking a new specific theory that leads to techniques that trains networks faster
  4. Letting it be known outside of Conjecture that we have used/built/deployed a technique that already exists in the literature to train networks faster
  5. Letting it be known outside of Conjecture that we are interested in using/building/deploying a technique that already exists in the literature in order to train networks faster

1-3 are obvious. 4-5 are dangerous because they attract more attention to ideas that increase average negative externality. If in the future we want to hide more types of information that are not covered by the current policy, we should explicitly extend the scope of what is hidden.

Siloing of information and projects is important even within Conjecture. Generally any individual team member working on secret projects may disclose to others that they are working on secret projects, but nothing more. 

The default mantra is “need to know”. Does this person need to know X? If not, don’t say anything. Ideally, no one that does not need to know should know how many secret projects exist, which projects people work on, and what any of those projects are about. 

While one should not proactively offer that they are keeping a secret, we should strive for meta-honesty. This means that when asked directly we should be transparent that we are observing an infohazard policy that hides things, and explain why we are doing so. 

Rules

There are three levels of disclosure that we will apply.

We will consider these levels of disclosure for following types of information:

Each project that is secret or private must have an access document associated with it that lists who knows about the secret and any whitelisted information. This document is a minor infosecurity hazard, but is important for coordination.

An appointed infohazard coordinator has access to all secrets and private projects. For Conjecture, this person is Connor, and the succession chain goes Connor → Gabe → Sid → Adam. When collaborating with other organizations on a secret or private project, each organization’s appointed coordinator has access to the project. This clause ensures there is a dedicated person to discuss infohazards with, help set standards, and resolve ambiguity when questions arise. A second benefit of the coordinator is strategy: whoever is driving Conjecture should have a map of what we are working on and what we are intentionally not working on. 

Leaking infohazardous information is a major breach of trust not just at Conjecture but in the alignment community as a whole. Intentional violation of the policy will result in immediate dismissal from the company. This applies to senior leadership as well. Mistakes are different from intentional leaking of infohazards. 

More details on the levels of disclosure are below, and additional detail on consequences and the process for discerning if leaked information was shared intentionally or not is discussed in “Processes”.

Secret

Private

Public

Processes

1. Assigning Disclosure Levels

For new projects: Whenever a new project is spun up, the appointed infohazard coordinator and the project lead work will work together to assess if the content of the project is infohazardous and if it should be assigned as secret, private, or public. Each conversation will include: 

(1) what information the project covers

(2) in what forms the information about the project already exists, e.g., written, repo, AF post, etc.

(3) who knows about the project, and who should know about the project

(4) proposed disclosure level

If the project is determined to be secret or private, an access document must be created that lists who knows about the project and any whitelisted information. Any information about the project that currently exists in written form must be moved to and saved in a repository or project folder with permissions limited to those on the access document list.

Anyone can ask the appointed infohazard coordinator to start a project as a secret. The default is to accept. At Conjecture, the burden of proof is on Connor if he wants to refuse, and he must raise an objection that proves that the matter is complicated enough to not accept immediately, and might change in the future. In general, any new technical or conceptual project that seems like it could conceivably lead to capabilities progress should be created as secret by default. 

(We will return to this clause after some months of trialing this policy to write better guidelines for deciding what status to assign projects).
 

For current projects (changing disclosure levels): Anyone can propose changing the disclosure level of a project.

When collaborating with another organization, there should be one or more individuals that both parties agree is trusted to adjudicate on the matter.

2. Sharing Information

Each project must have an access document associated with it that lists who knows about the information and what information is whitelisted to discuss more freely. This list will be kept in a folder or git repository that only members of the secret or private project have access to. 

Secret information can only be shared with the individuals who are written on the access list. Anyone in a secret project may propose adding someone new to the secret. First discuss adding the individual with the project leader, and then inform all current members and give them a chance to object. If someone within the team objects, the issue is escalated to the appointed infohazard coordinator, who has the final word. If the team is in unanimous agreement, the coordinator gets a final veto (it is understood that the coordinator is supposed to only use this veto if they have private information as to why adding this person would be a bad idea). 

Private information can only be shared with members of groups who are written on the access list. Before sharing private information with person X, first check if the private piece of information has already been shared to someone from the same group as X. Then, discuss general infohazard considerations with X and acknowledge which select groups have access to this information. Then, notify others at Conjecture that you have shared the information with X. In case of doubt, ask first. 

Public information can be talked about with anyone freely, though please be reasonable. 

For all secret and private projects, by default information sharing should happen verbally and should be kept out of writing (in messages or documents) when possible. 

3. Policy Violation Process

We ask present and future employees and interns to sign nondisclosure agreements that reiterate this infohazard policy. Intentional violation of the policy will result in immediate dismissal from the company. The verdict of whether the sharing was intentional or not will be determined by the appointed infohazard coordinator but be transparent to all members privy to the secret ((i.e., at Conjecture, Connor may unilaterally decide, but has his reputation and trust at stake in the process).

C-suite members of Conjecture are not above this policy. This is imperative because so much of this policy relies on the trust of senior leaders. As mentioned above, the chain of succession on who knows infohazards goes Connor → Gabe → Sid → Adam; though actual succession planning is outside the scope of this document. If it is Connor who is in question for intentionally leaking an infohazard, Gabe will adjudicate the process with transparency available to members of the group privy to the secret. Because of the severity of this kind of decision, we may opt to bring in external review to the process and lean on the list of “Trusted Sources” above.

Mistakes are different from intentional sharing of infohazards. We will have particular lenience during the first few months that this policy is active as we explore how it is to live with. We want to ensure that we create as robust a policy as possible, and encourage employees to share mistakes as quickly as possible such that we can revise this policy to be more watertight. Therefore, unless sharing of infohazardous information that is particularly egregious, nobody will be fired for raising a concern in good faith.

4. Information Security and Storage

[Details of Conjecture’s infosecurity processes are - for infosecurity reasons - excluded here.]

5. Quarterly Policy Review

We will review this policy as part of our quarterly review cycle. The policy will be discussed by all of Conjecture in a team meeting, and employees will be given the opportunity to talk about what has gone well and what has not gone well. In particular, the emphasis will be on clarifying places where the policy is not clear or introduces contradictions, and adding additional rules that promote safety.

The quarterly review will also be an opportunity for Project Leaders to review access documents to ensure lists of individuals and whitelisted information for each project are up-to-date and useful.

This policy will always be available for employees at Conjecture to view and make suggestions on, and the quarterly review cycle will be an opportunity to review all of these comments and make changes as needed.

Additional Considerations

The information below is not policy, but is saved alongside Conjecture’s internal policy for employee consideration.

Example Scenarios

It is difficult to keep secrets and few people have experience keeping large parts of their working life private. Because of this, we anticipate some infohazardous information will leak due to mistakes. The following examples are common situations where infohazardous information could leak; we include potential responses to illustrate how an employee could respond. 

  1. You have an idea about a particular line of experimentation in a public project, but are concerned that some of the proposed experiments may have capability benefits. You are weighing whether to investigate the experiments further and whether or not you should discuss the matter with others. 

    Potential response: Consider discussing the matter in private with the project lead or appointed infohazard coordinator. If it is unknown whether information could potentially be infohazardous, it is safer to assume risk. A secret project could be spun off from the public project to investigate how infohazardous it is. If the experimental direction is safe, it could be updated to be public. If the experimental direction is infohazardous, it could stay secret. If the experimental direction is sufficiently dangerous, the formerly public project could be made secret by following the process in “Assigning Disclosure Levels” in the policy.
     
  2. You are in the same situation and have an idea for a particular line of experimentation in a public project, but this time believe P(experiments result in capabilities boost) is very small but still positive. You are considering whether there is any small but positive probability with which you should act differently than scenario (1).

    Potential response: Ultimately, a policy should be practical. Sharing information makes people more effective at doing alignment research. There is always a small probability that things can go wrong, but if you feel that an idea has low P(experiments result in capabilities boost) while also being additive to alignment, you can discuss it without treating it as secret. That said, if you have any doubt as to whether this is the case or not in a particular situation, see scenario (1).
     
  3.  You are at a semi-public event like EAG and a researcher from another alignment organization approaches and asks what research projects you and other Conjecture employees are working on.

    Potential response: Mention the public projects. You may mention the fact that there are private and secret projects that we do not discuss, even if you are not part of any. If the individual is a member of one of the groups, you may mention the private projects the group the person belongs to is privy to.
     
  4. You are at an alignment coffee-time and someone mentions a super cool idea that is related to a secret project you are working on. You want to exchange ideas with this individual and are worried that you might not have the opportunity to speak in the future.

    Potential response: The fact that this is a time limited event should not change anything. One must go through the process, and the process takes time. This is a feature and not a bug. Concretely, this means you do not discuss that secret project or the ideas related to the project with that person. Feel free to learn more about how far that person is in their idea though.
     
  5. You are talking with people about research ideas and accidentally share potentially infohazardous information. You realize immediately after the conversation and are wondering if you should tell the people you just spilled information to that the ideas are infohazardous and should be kept secret.

    Potential response: Mention this to the project lead and appointed infohazard coordinator as soon as possible before returning to the people, and discuss what to do with them. Because these situations are highly context dependent it is best to treat each on a case by case basis rather than establishing one general rule for mistakes.
     
  6. You are at EAG and you come across someone talking publicly about an idea which is very similar to an infohazardous project you are working on. You are considering whether to talk to them about the risk of sharing that information.

    Potential response: This depends on how good you are with words. If you confidently know you are good enough to hold this conversation without spilling beans, go. Else, if you have any doubt, mention this to your project lead and the appointed infohazard coordinator.

Best Practices

The following are a number of miscellaneous recommendations and best practices on infohazard hygiene. Employees should review these and consider if their current approach is in line with these recommendations. 

Psychological Safety

Working on a secret project and not being able to talk about what you’re doing and thinking about can take an emotional toll. The nature of Conjecture (startup, generally young, mostly immigrants) means that for most employees, coworkers provide the majority of socialization, and a large aspect of socialization with coworkers is talking about projects and ideas.

On one hand, the difficulty of secret-keeping should be embraced. The fact that it takes an emotional toll is not coincidence, and is well aligned with reality. Mitigations against this may make things worse, and we should default towards not employing people if they have difficulty holding secrets.

On the other hand, we do not currently have the bandwidth to be perfectly selective as to who we hire and assign to secret projects. And we can’t rely on people self-reporting that they'll be incapable of holding a secret before being hired or assigned to a project. Most people don't have a good counterfactual model of themselves.

Therefore psychological safety is not just a concern for the emotional well-being of employees but also for the robustness of this policy. Someone who is feeling stressed or isolated is more likely to breach secrecy. Emotional dynamics are just as real a factor in the likelihood that secrets get shared as the number of people who know the secret. In both cases we assume human fallibility. If we only ever hired infallible people, there would be no reason to have internally siloed projects.

Potential risk factors that amplify the likelihood that an infohazard is revealed:

As such, we will consider taking some possible solutions into account with our approach to infohazardous projects such as not assigning people only to siloed projects, siloing projects between collaborators who are used to being very open with each other, or adding a trusted emotional support person to project siloes who knows only high-level and not implementation details. Note that Conjecture will not guarantee following any of these steps, and therefore this is not policy but rather general considerations.

In general, employees reading this policy should understand that mental health and psychological safety are taken seriously at Conjecture, and that if there are ever any concerns about this, that they should raise any concerns with senior management or whomever else they are comfortable speaking with. Rachel and Chris have both volunteered as confidants if individuals would prefer to express concerns to someone besides Connor, Gabe, or Sid. 

An additional emotional consideration is that it should cost zero social capital to have and keep something secret. This is very much not the default without a written policy, where it often costs people social capital and additional effort to keep something secret. The goal at Conjecture is for this not to be the case, and for anyone to be able to comfortably keep things secret by default without institutional or cultural pushback. We also intend for this policy to reduce overhead (the need to figure out bespoke solutions for how to handle each new secret) and stress (the psychological burden of keeping a secret). Having access to a secret is by no means a sign of social status. In that vein, a junior engineer might have access to things that a senior engineer does not. 




 

6 comments

Comments sorted by top scores.

comment by Ben Pace (Benito) · 2022-08-12T03:33:40.576Z · LW(p) · GW(p)

I think this is a fairly thoughtful document, thanks for writing it, and for sharing it here.

For current projects (changing disclosure levels): Anyone can propose changing the disclosure level of a project.

  • Secret → Private: To move a project from secret to private, all members of the project and the appointed infohazard coordinator must agree.
  • Private → Public: Before making public any information, all members of the project must agree. Also, members must consult external trusted sources and get a strong majority of approval.

My first impression about this is worrying.

  1. The move from Secret -> Private sounds like decision-by-consensus. I am against decisions-by-consensus inside any organization that needs to move fast and get things done. My first alternative proposal would be that one person owns the decision. Failing that, maybe something like any group of N people can vote to veto the decision-maker's decision, such that the decision-maker needs to get a certain level of buy-in.
  2. Consulting external trusted sources sounds like a very slow-moving process, the sort of thing where something you expect to take days actually takes months. A rule we often assume at Lightcone is that literally all external parties will move too slow for us and we should try to minimize occasions where we are blocked on them. I don't think that the idea stated in the OP is a bad one, but I might try to have the external parties agree to be voters in this, and have them agree to get back to you within e.g. 1 month of being requested to make their decision.

Broadly, I'd also add another point:

  • I've found that it's very difficult to make strategic decisions if you have nobody to talk to about them. It currently sounds like you plan for your CEO (Connor) to have a lot of secret information that no other single person in the organization has access to all-of. My guess is that will make it very hard for Connor to think about the entire strategic landscape, because there will be no social context in which he can think about it all with another person. I would suggest having two people know about all secrets within the org, to the best of your ability.

Broadly I don't really know what to do about secrecy and find it very costly personally, so don't take any of my points too strongly.

Replies from: Benito
comment by Ben Pace (Benito) · 2022-08-12T03:59:07.566Z · LW(p) · GW(p)

Another thought:

infohazard coordinator

On first pass, I didn't pickup if this is always Connor or can be different people in Conjecture. Anyway, I think whoever it is should consider it an active responsibility to be very responsive to anyone's requests or queries. The default thing that happens when there's a person with massive power over a project but isn't in constant contact with the project, is that they slow everything down. Like if it were my job, I might be like... 

Okay, I really don't know, there's a bunch of factors. I don't know if the infohazard coordinator is actually on the team of the project they're coordinator for, I don't know how many projects they're coordinator for, and I don't know how fast requests need to be answered. Nonetheless, here's the kind of rule-set I can imagine making sense.

  • If anyone ever asks about a private project "Can I share with Person X" I should always answer same day, and maybe set a target to always answer in <3 hours.
  • If anyone wants to add a collaborator to a secret project, I should always give them an answer within 2 days.
  • If anyone wants to change the secrecy level of a project, I give my take on it within 5 working days, and generally should either say "no" or should set in-motion the plan to move it out of that secrecy level within 2 weeks of the initial request.

My guess is without this sort of ruleset, without the infohazard coordinator taking on the responsibility to respond extremely quickly whenever they're blocking a research team, at some point folks will be asking "Why did Project X not get finished 2 months faster?" and the answer will be "Well it was too costly for us to get in-sync with the infohazard coordinator about who we could share on this project because they were always busy with other projects, so we ended up not sharing our work with Alice, Bob, or Charlie until much later than we otherwise would have, and each time we did we got a big speed up."

comment by Lone Pine (conor-sullivan) · 2022-07-30T01:57:27.012Z · LW(p) · GW(p)

Thank you for sharing this information, Connor and team. I would be interested in hearing a follow up, say in six months. I'm sure it will help other people in the field to learn how this policy works out and how it evolves. (Although ironically, I imagine you will have trouble keeping us updated if doing so might inadvertently leak information.)

comment by elspood · 2022-09-12T00:51:45.485Z · LW(p) · GW(p)

This is a great draft and you have collated many core ideas. Thank you for doing this!

As a matter of practical implementation, I think it's a good idea to always have a draft of official, approved statements of capabilities that can be rehearsed by any individual who may find themselves in a situation where they need to discuss them. These statements can be thoroughly vetted for second- and higher-order information leakage ahead of time, instead of trying to evaluate in real-time what their statements might reveal. It can be counterproductive in many circumstances to only be able to say "I can't talk about that". It also gives people a framework to practice this skill ahead of time in a lower-stakes environment, and the more people who are already read in at a classification level have a chance to vet the statement, the better the chance of catching issues.

The downside of formalizing this process is that you end up with a repository of highly sensitive information, but it seems obvious that you want to practice with weapons and keep them in a safe, rather than just let everyone run around throwing punches with no training.

comment by Veedrac · 2022-08-12T01:41:02.344Z · LW(p) · GW(p)

If I understand the rules as written correctly, the more people in Conjecture know a Secret, the harder it is to change it to Private-to-Conjecture. This seems more like a minor bug than a weird quirk, IMO, and I'd personally prefer a scheme that scaled in less than perfect proportion with group size, both to respect the dynamics of Secrets in large groups better, and to dilute the decision this would currently give to small groups.

comment by Charbel-Raphaël (charbel-raphael-segerie) · 2022-08-03T07:54:23.870Z · LW(p) · GW(p)

For discussions between friends about capabilities for, say estimating timelines, if I need to convince a friend that timelines are short, what is the policy you would recommend?