OHGOOD: A coordination body for compute governance
post by Adam Jones (domdomegg) · 2024-05-04T12:03:16.716Z · LW · GW · 2 commentsThis is a link post for https://adamjones.me/blog/oh-good/
Contents
Motivation The idea 1. Registering new AI chips Determining chips in scope AI chip identifiers Sending information to the non-profit body 2. Transferring ownership of AI chips Low-risk transfers 3. Renting AI chips 4. Destroying AI chips 5. Determining relevant stakeholders and making information available 6. Encouraging compliance International treaty Sanctions-like framework Domestic enforcement Deposit scheme 7. Addressing future compute governance advances Declassifying low risk chips Expanding hardware governance Advanced compute governance measures Unknown Unknowns Risks Privacy risks Promoting arms race dynamics Barriers to entry Security risks Governance at Scale Request for feedback Acknowledgements None 2 comments
Core to many compute governance proposals is having some kind of register that records who owns AI chips.
This article explores how this register could be implemented in practice, outlining an organisation that maintains such a register and its necessary processes. It's named OHGOOD, the Organisation Housing the GPU & Others' Owners Database.
Motivation
Training highly-capable and broad AI systems requires lots of compute and data, along with efficient algorithms.
Compute is the easiest to track of these three since it currently relies on specialised expensive AI chips that can only be produced by a few actors in the world. Both data and algorithms are comparatively much harder to track: public datasets such as Common Crawl (a set of 250 billion web pages) can be downloaded by anyone, and key algorithmic breakthroughs that have enabled recent AI advances are published in scientific journals.
[More definitions on the above are in the linked post]
By tracking AI chips, the hope is that we are able to identify people with the capability to train highly-capable and broad AI models, and thus many of the most risky models. We could then ideally verify that these actors are using their AI chips in a safe way.
Previous work in compute governance has briefly touched on the need for this tracking body:
- Shavit[1] put forward a framework for enforcing rules about large scale ML training, by recording snapshots of work done by AI chips (via on-chip firmware) and then requiring developers to show it is part of a compliant training run. In section 6.1 it explains a ‘chip-owner’ directory (corresponding to this proposal) is needed to be confident a developer is reporting all their training activity.
- Baker[2] analysed the use of verification in nuclear arms control with a view to how it could be applied to a future AI safety treaty. In Annex G it describes how AI chip accounts might be verified, using methods analogous to nuclear arms control verification. The bodies responsible for these accounts correspond to this proposal.
This paper explores the various functions it would need to carry out in more detail, as well as some potential incentive schemes.
[A few limitations of compute governance are in the linked post]
The idea
We propose an international non-profit body that keeps a register of AI chips and their owners. This register should be:
- accurate and up-to-date
- trusted or at least mostly verifiable
- accessible, in the sense that relevant stakeholders (e.g. nation states wanting to ensure compliance with a future treaty) can query or view the register
- international, given that AI chips are likely to move between countries as part of complex supply chains and that there is interest in global compute governance
The initial mission of this non-profit could therefore be:
Ensure the responsible use of AI chips, by making accurate, up-to-date and trusted information about global AI chip control easily accessible to relevant stakeholders.
In practice this is likely to involve:
- Recording when new AI chips are created
- Handling transferring ownership of AI chips
- Handling renting AI chips
- Handling destroying AI chips
- Determining who the ‘relevant stakeholders’ are, and making the information available them
- Encouraging compliance with all the relevant procedures
- Evolving to address advances in compute governance
We explore these processes in further detail below.
[A few more considerations are in the linked post]
1. Registering new AI chips
When AI chips are created, they should have some kind of unique identifier. This identifier should be sent to the non-profit body with details about the chip.
Determining chips in scope
Chips in scope should be those that could feasibly be used as a significant part of training a high-risk AI model.
In most cases it’ll likely be obvious whether a chip falls under this definition, however there will be some edge cases where it is unclear. Where things are unclear, there are general arguments both for and against including them.
Including ambiguous chips maximises coverage and therefore reduces the chance important chips go untracked. It’s easy to later remove chips from the register if it becomes clear they do not meet the definition, but much harder to trace them down and add them.
Excluding these chips, this reduces the scope of the organisation and could make it easier to get buy-in from other actors given the requirements would be less of them.
This definition is still fairly broad. Further work could help develop a more precise definition. Doing so will be difficult, as it may need to be resistant to organisations working around such a definition - for example in 2023, NVIDIA developed the H800 and H20 chips to work around US export controls of AI chips to China.
AI chip identifiers
Most manufacturers of AI chips already issue serial numbers to devices, and so are used to generating unique identifiers for their chips. However, going beyond adding serial numbers there are a few properties of identifiers for compute governance that would be useful.
The identifiers should be hard to remove. Ideally its removal could make the chip inoperable, or at least any tampering should be obvious on inspection.
In addition, it should be hard to forge identifiers. This is to prevent bad actors pretending to be holding chips in certain locations or using them for certain purposes (and being able to pass inspections), while using the real chips somewhere else or for other purposes.
One way to achieve forge resistance could be to use cryptography. For example, a tamper-resistant secure element could be added to the chip, similar to Apple’s Secure Enclave root cryptographic keys. This could be used to hold a key unique to the chip, to sign data i.e. so each chip has a unique and mathematically difficult to forge signature. This would significantly increase the complexity of forging chips, while not significantly increasing costs: external chips implementing secure elements can be bought for under a dollar - a trivial addition to the $40,000+ retail price of AI chips (but for security purposes these chips would need to be on the same silicon die as the AI chip itself, rather than being an external chip that could more easily be swapped out).
Lastly, it should be possible to query the identifiers via software. This is likely to complement the identifiers being difficult to remove and forge, and makes it easier to remotely gain some assurance that the chips are genuine. While remote inspections won’t give perfect proof, they could serve as a low-cost type of inspection that can be done more frequently and at larger scale than a manual inspection, and gives some additional confidence in the control of the chips (especially if cryptographic measures are implemented).
While perfect anti-forgery measures are hard to attain given people have unlimited physical access to the chips, adding these safeguards would make it much harder for even well resourced bad actors to hide chip ownership. The increased complexity would serve as a strong deterrent against forgery itself, make it more probable that the forger makes mistakes that could reveal their actions, and require additional people to be involved, therefore making it more likely that one of them exposes the scheme.
While the above properties would make for good chip identifiers, current chips not implementing this should not be seen as a reason to block or delay founding a chip tracking body. Starting by tracking lower-quality identifiers would still be valuable in the interim.
[A very brief review of how well current chips implement this is in the linked post]
Sending information to the non-profit body
At time of creation, it should be relatively simple for the manufacturer to send information about the chip to the non-profit, e.g. over an API.
Incentives for manufacturers to do this are discussed in the ‘encouraging compliance’ section below.
2. Transferring ownership of AI chips
When chips are bought and sold, the register needs to be updated with the new owner of the chips.
At a minimum, the buyer must confirm the transfer as otherwise a seller could falsely claim that they had passed them on to a buyer when they hadn’t actually done this transfer. Additionally, the buyer is the one who could prove ownership of the chips if the identifiers had the cryptographic measures detailed above (whereas it’s hard for the seller to prove non-ownership in the same way, and it doesn’t mean that the seller has them).
However in practice, it is likely useful for both parties to ensure the details are correct before the register is updated. This reduces the chance of mistakes and could make the seller liable for incorrect updates. Additionally, requiring the buyer to install each chip and extract a cryptographic signature from them to prove ownership is likely unfeasible for intermediaries such as resellers.
Where this transfer of ownership takes place, each party’s identity needs to be appropriately verified, to ensure chips are being genuinely transferred to the expected organisation. This may need KYC (Know Your Customer) processes to be implemented at some stage: likely when an account is set up with the non-profit to avoid needing to do KYC for every transaction. To avoid organisations purchasing AI chips through shell companies or similar, they should be required to declare the true key owners - similar to people with significant control or ultimate beneficial owners.
Information about the transfer should be sent to the register in a timely fashion, for example within 7 days of it occurring.
Incentives for stakeholders to accurately record transfers are discussed in the ‘encouraging compliance’ section below.
Low-risk transfers
Chips capable of training AI models often overlap with chips for other uses. For example, high-end GPUs that could train AI models can also be used for playing video games, rendering video content, or mining cryptocurrency.
[A very brief analysis of overlap between AI chips and other chips is in the linked post]
Tracking small scale purchases of these chips, where it seems highly unlikely that the chip will be used for high-risk AI training, may create unnecessary overheads and privacy risks, particularly for individual consumers.
Thresholds should be put in place to determine when chips are transferred to low-risk owners and the chip can stop being tracked. This is likely to be based on a combination of:
- Chip type: e.g. $40,000+ chips, or chips designed almost solely for AI use cases, should be in scope
- Purchase quantity: e.g. buying thousands of consumer-level GPUs might be in scope
- Buyer information: e.g. whether they’re an individual or business, use of cryptic or vague identities, use of unusual payment methods, recent purchases of large amounts of RAM, network cards, or server motherboards. Lessons can likely be learnt from anti-money laundering processes.
Where a transfer happens to a low-risk owner, this should still be recorded so that it is clear that this transfer has occurred. This record should contain metadata, such as a retailer’s order id, so that details about this purchase could be investigated should the AI chip be later found in possession of an organisation training high-risk AI models.
An extension might be to require very high-end chips to have certain features unique to AI training permanently disabled before they are untracked. Such a policy would have to carefully balance the risk of the chip being used for dangerous AI training against the intentional destruction of chip capabilities that could be effectively applied elsewhere. This is likely only to be relevant in scenarios where AI chips are practical to use for other purposes e.g. for 3D rendering, which is not true of current AI chips.
Where large quantities of chips are becoming untracked, for example at large electronics retailers selling GPUs, audits should take place to ensure the low-risk transfers are genuine.
Finally, where a chip has moved to a low-risk owner, but a high-risk owner wants to buy the chip from them, this should be recorded on the register. Here the high-risk owner should be responsible for recording it in the register correctly, given they are likely the party with more resources and a better understanding of how the register works.
3. Renting AI chips
Many AI chips are owned by cloud providers, and are rented out to users including top AI companies. Key players in this space include traditional cloud providers such as Amazon Web Services, Microsoft Azure and Google Cloud Platform, as well as AI-focused cloud providers such as CoreWeave and Lambda Labs. For example, OpenAI rent the compute to power their research, products and API services from Microsoft Azure and Anthropic similarly rent their compute from Google Cloud and Amazon.
Understanding who is renting AI chips is therefore crucial to understanding who is ultimately controlling the AI chips, and potentially using them to train risky AI models.
Therefore the system needs to handle temporary transfers, as well as permanent transfers. Given the short-term nature of many such transfers (e.g. for on-demand hourly billing), the implementation of this process needs to be simple. One possible implementation could be delegating access to the cloud provider through an OAuth-like process to record rentals on the renter’s account.
Similar to low-risk transfers, an exception might be made for lower-risk rentals. Again, it should be recorded that the rental occurred, with some metadata that allows for further investigation - but full details about the renter might not be included.
Information about the rental should be sent to the register in a timely fashion, for example within 7 days of it starting. If at the time of reporting the end date of the rental is unknown, this should be noted and it should be updated when the end date is known. Curtailments or extensions of rental agreements should also be sent to the register.
4. Destroying AI chips
When AI chips are destroyed, damaged beyond use, lost or stolen the register should be updated to note this. For large numbers of chips, this might warrant an in-depth investigation to ensure the chips have not been diverted for potentially dangerous uses.
We expect relatively few of these reports involving large numbers of chips. Given that chips are powerful enough to be worth tracking, most of them will be highly valuable and therefore their owners will be incentivised to handle them securely.
Where an organisation with a large number of these chips is upgrading to a newer version or similar, the old chips are likely to be sold to someone else rather than destroyed. The few exceptions are likely to be very security conscious customers, such as intelligence agencies.[3]
5. Determining relevant stakeholders and making information available
There are a wide range of stakeholders that AI chip ownership information might be relevant to. This presents a number of options for register information disclosure including:
State parties, only for information in its state: The most strict form might be that information is only available to state bodies, for AI chips in those states. This feels like a minimum, given the country could create legislation to force organisations to share this information anyways. This might help unilateral compute governance measures e.g. to understand what competition is looking like within a state. It also would still allow states to independently decide whether to publish the statistics publicly.
All state parties: All information on the register is shared with state bodies that have signed up to some kind of treaty. This is different from above in that each state can see all other states’ AI chip ownership. In practice, states designate some kind of body to share this information with, e.g. a national AI safety institute.
Trusted non-state parties: Information on the register is selectively shared with a group of trusted organisations, based on some review process. For example, to access the information you need to apply with a use case which would then be reviewed by a governance team. This is similar to Research Data Centres for US census data, or access to US or UK healthcare data via PCORnet or OpenSAFELY.
Full transparency: All information on the register is made public. This makes accessing the information for different purposes easy and avoids the need to guard the register from information disclosure given it’s already public. Other analogous organisations work like this, even with sensitive data: IANA’s IP ranges are public (highlighting addresses where military equipment is connected to the internet), and the IAEA makes the location of member state nuclear reactors public via their PRIS platform.
Full transparency of the AI chip register should be the default starting point. Making the information public has several benefits - it reduces opportunities to hide dangerous chip usage, enables broader research and understanding of the AI compute landscape, and builds public trust through transparency.
6. Encouraging compliance
There are a few ways compliance with the processes above could be achieved. We explore using an international treaty, a sanctions-like framework, domestic enforcement, and a deposit scheme.
International treaty
An international treaty signed by key countries could create obligations for member states to have organisations within their jurisdiction comply with AI chip ownership rules. It would also obligate states to enforce this law and properly resource any national body responsible for overseeing the system.
Peer pressure from other member countries via treaty meetings as well as dispute resolution mechanisms for non-compliance create incentives to effectively implement required legislation. This could be especially powerful when combined with a sanctions-like framework detailed below.
Overall, this treaty would be similar to the Treaty on the Non-Proliferation of Nuclear Weapons, which obligates member states to track fissionable materials. It designated the IAEA as the international non-profit to audit compliance with the treaty (although the registers themselves are maintained by member states individually and data is shared with the IAEA, rather than the IAEA managing this information directly). Parallels between a potential AI treaty and existing nuclear treaties are explored more deeply by others.[2]
Sanctions-like framework
A properly maintained global AI chip register creates opportunities for enforceable sanctions on chip transfers.
Organisations with poor compliance records or countries with lax registration laws could wholesale be deemed 'high risk' - forcing more scrutiny of chip transfers to those jurisdictions. Entities found repeatedly flouting registration rules or broader responsible AI commitments could effectively have their access to advanced chips cut off worldwide.
A register therefore turns non-compliance with AI commitments into enforceable reputational costs and transaction friction, backed up by a credible threat of cutting off access to leading AI compute. Over time this could shape markets towards responsible and trackable AI development.
In addition, this might make countries with lax AI regulations be seen as difficult to work in, given more due diligence has to be carried out before receiving AI chips. This could create additional positive incentives to introduce effective AI regulation.
Domestic enforcement
Domestic regulators’ set up under the treaty should have the primary goal of ensuring the register is kept accurate and up-to-date. They should collaborate with the international non-profit to facilitate international inspections, investigate potential incidents, and explore ways to further encourage global compliance.
Regulators should be empowered to fully investigate missing or otherwise inaccurate registrations, and prosecute related offences. This will require properly resourcing them so they are able to effectively supervise powerful technology companies.
These regulators for ensuring register compliance could be part of wider AI regulators set up to enforce other related AI regulations.
Organisations that do not comply with rules around AI chip registration could receive fines or other penalties. Graduated penalties could distinguish accidental non-disclosure versus deliberate evasion or obstruction of oversight. Penalty size could also reflect company resources, from simple warning letters for smaller entities up to major fines or criminal charges for large multinationals willfully flouting obligations.
Deposit scheme
A deposit scheme would financially incentivize organisations to comply with AI chip registration. When producing a new chip, developers would pay a refundable deposit, of say 5% of the chip cost, which is returned to the then-owner in instalments, for example of 5 equal payments over the next 5 years. The exact amounts and repayment schedules would have to be set at a high enough level to encourage compliance during the period where the chip is still relevant to AI training, while balancing the increased costs added to purchasing AI chips.
Random sampling could be used to ensure registrations were accurate and up-to-date. Where this unearths batches of AI chips with inaccurate registration data, some part of the deposit could be forfeited as a penalty. This incentivises actors owning AI chips to keep records accurate.
Additionally, deposits left unclaimed can signify chips not properly registered, automatically alerting authorities to investigate the last known controlling organisation’s activities.
Compared to international treaties and domestic enforcement regimes, a deposit scheme is potentially easier to set up. This is because it only needs buy-in at one stage of the supply chain, which is a much narrower bottleneck than getting all countries involved in the transfer or use of AI chips to agree to a treaty.
[A brief analysis of AI chip supply chain bottlenecks is in the linked post]
7. Addressing future compute governance advances
The organisation set up to track AI chips should be forward-looking to ensure it is appropriately encouraging AI chips to only be used safely. This section outlines future processes the organisation might consider.
Declassifying low risk chips
As AI chips age and become obsolete for cutting edge AI work, the need to tightly track them diminishes. A declassification process could transition older chip generations to reduce or eliminate registration requirements for older chips.
Expanding hardware governance
While AI chips are the current focus, expanding hardware governance to other inputs to the AI training process could become necessary. This could include:
- raw silicon wafers
- lithography equipment
- high bandwidth memory
- high-speed or advanced storage devices
- very high-throughput network cards
The suggestions above are highly influenced by common methods of training and running today's AI models, particularly large language models. Currently, as well as top-end AI chips to do the computations, large amounts of high-bandwidth memory and network devices are needed to handle the large amounts of training data and model weight updates.
The exact types of hardware that should be considered for tracking need to be chosen with regard to the future AI chip supply chain, AI training methods, and understanding of how else these hardware components are used.
Advanced compute governance measures
After laying the groundwork for basic chip tracking, other more advanced governance approaches may be brought in, such as:
More granular location tracking: More precise locations of chips (e.g. which data centre) could help enable more in-depth verification measures and support investigations of lost chips.
Utilisation auditing: Telemetry or similar reporting could provide insight into the intensity and kinds of workloads being run on chips. For example, a retailer keeping chips in a storage warehouse to sell to retail customers is very different to them being at 100% usage in a data centre.
Training run compliance: Snapshots of training weights during model training could be taken, and later inspected to ensure the training run complied with future rules on safe AI training. Others have explored this in much more detail.[1]
These additional compute governance approaches would increase confidence that AI chips were not being used to create risky AI models. However, they would also place additional burdens on owners of AI chips.
A risk-based approach could be taken to introduce different governance measures. For example, large deployments of very high-end chips might be subject to the most strict and intensive measures, while smaller deployments of older or weaker chips might be subject to only simple ownership tracking.
Unknown Unknowns
Finally, emerging technologies may necessitate tracking new metrics not initially obvious. The non-profit administrating the register should continually survey the AI landscape for oversight gaps, and update governance controls as necessary.
This should include the possibility that compute governance is no longer a viable option to govern the development of high-risk AI models. While unlikely, this could happen if algorithmic breakthroughs or significant general hardware advances mean a much wider range of actors could train dangerous AI models, such that tracking compute did not add much value.
Risks
The above proposal comes with a few potential risks that should be considered before implementing it. Despite these risks, we believe as proposed it’s still likely significantly net positive (but before starting it would be worth doing a botec!).
Privacy risks
Detailed AI chip tracking risks unnecessarily infringing people’s rights to privacy, as well as creates a wasteful regulatory burden. This can be mitigated by:
- excluding AI chip tracking in low-risk circumstances, such as small purchases to individuals for purposes unrelated to AI
- declassifying low risk chips over time, to avoid excess tracking
- only requiring more intrusive compute governance measures for larger scale deployments or high-end AI chips
Only focusing on high-end data centre AI chips excludes 99.99974% of semiconductor chips, limiting privacy risks.
Promoting arms race dynamics
Transparent AI chip registers theoretically reduce arms race incentives by providing mutual visibility into rival capabilities. However, this may backfire in case a state has significantly more AI chips than another and this provokes a fear or political pressure to ‘catch up’.
Managing these situations is likely to be difficult. Careful framing of this information before release, that encourages collaboration or negotiations between states would likely be necessary to minimise fallout.
Other similar agreements have been thought to generally reduce tensions between states. For example:
- The Treaty on Open Skies, where member states grant others the permission to fly observation aircraft over their territory to gather information on their military forces, with the idea that greater transparency can reassure countries that potential adversaries are not about to go to war.
- The IAEA carries out inspections of civilian nuclear sites, which sometimes unearths non-compliance with nuclear weapons agreements. So far they have generally seemed able to flag issues effectively to encourage compliance, without escalating them into arms races.
Barriers to entry
In general, introducing regulations creates some additional burden on organisations operating within the area. Additionally, this often affects new entrants the most - as they don’t have the existing resources to absorb the compliance cost.
One proposed method for encouraging compliance was a deposit scheme. While this aligns incentives, it increases the capital needed to purchase AI chips and thus could also discourage new startups in the area. This could exacerbate the risk of concentrating power in the hands of few organisations that do currently have the capital to build state of the art AI models. If used, the deposit scheme contribution amounts would need to balance reducing dangerous AI model training against this risk.
Security risks
A register with details about high-end AI chips raises security concerns.
Even without location data, it is likely possible to know when chips are being transported by aligning register data with other OSINT data like ship, plane or train tracking databases. This might help adversaries steal valuable chips that are potentially dangerous in the wrong hands. Further investigation is necessary to validate whether this truly is a credible threat (as this might already be possible, or it might be that the register doesn’t help). If it is a risk, this might be mitigated by delaying public release of the data, or redacting data about particularly vulnerable points in the supply chain.
Extended versions of the register with more location data will pose greater risks. Governments are likely to be hesitant to publish locations of secure facilities with AI chips as this could make them more vulnerable to attacks or sabotage. This more sensitive information might be aggregated, or only selectively disclosed to trusted partners.
Governance at Scale
Running any large international organisation poses significant challenges due to the number of stakeholders involved. Each member country brings varying geopolitical interests, creating a complex landscape to navigate.
Additionally, running the organisation's operations is likely to be challenging. In-person inspections might necessitate operating in many different countries, and the technical nature of AI research will likely make finding qualified technical staff difficult.
Request for feedback
This is one of my first public blog posts on AI governance. I’d be keen to receive feedback via this form. Also if anyone knows how to replicate the toggleable details blocks in the original post on LessWrong do let me know!
Acknowledgements
Thanks to Rudolf Laine for reviewing and providing feedback on an early draft of this document. All errors are still very much my own!
Cover photo by Patrik Kernstock.
- ^
- ^
Baker M. Nuclear Arms Control Verification and Lessons for AI Treaties (2023).
- ^
The UK standards for sensitive information require that any form of digital memory is destroyed by guillotining, disintegration, hammer-milling, shredding, incineration or smelting (Annex A). Most AI chips will have some form of memory built-in, requiring their destruction (e.g. graphics cards are given as an example in Annex B). For example, NVIDIA’s blog post on their H100 chip explains it has on-chip L1 and L2 caches (SRAM), and on-die HBM3 memory (DRAM).
2 comments
Comments sorted by top scores.
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-04T15:54:25.982Z · LW(p) · GW(p)
This is a solid seeming proposal. If we are in a world where the majority of danger comes from big datacenters and large training runs, I predict that this sort of regulation would be helpful. I don't think we are in that world though, which I think limits how useful this would be. Further explanation here: https://www.lesswrong.com/posts/sfWPjmfZY4Q5qFC5o/why-i-m-doing-pauseai?commentId=p2avaaRpyqXnMrvWE [LW(p) · GW(p)]
Replies from: domdomegg↑ comment by Adam Jones (domdomegg) · 2024-05-05T00:05:43.514Z · LW(p) · GW(p)
Thanks for the feedback! The article does include some bits on this, but I don't think LessWrong supports toggle block formatting.
I think individuals probably won't be able to train models themselves that pose advanced misalignment threats before large companies do. In particular, I think we disagree about how likely we think it is that there's some big algorithmic efficiency trick someone will discover that enables people to leap forward on this (I don't think this will happen, I think you think this will).
But I do think the catastrophic misuse angle seems fairly plausible - particularly from fine-tuning. I also think an 'incompetent takeover'[1] might be plausible for an individual to trigger. Both of these are probably not well addressed by compute governance (except maybe by stopping large companies releasing the weights of the models for fine-tuning by individuals).
- ^
I plan to write more up on this: I think it's generally underrated as a concept.