Proposal for improving state of alignment research

iknownothing-1

Proposal for improving state of alignment research

post by Iknownothing · 2023-11-06T13:55:39.015Z · LW · GW · 0 comments

  Goal:
    Proposed Method:
    Evidence it's worked elsewhere:
    Risks:
None
No comments

Goal:

Increase the quality and quality of AI alignment research, such that there is less chance of human extinction- goal of 0%.
This relies on work being done to gain more time being fruitful- that shouldn't be neglected for this.

Proposed Method:

A way to improve the output level of ai alignment is to make it profitable.

With liability laws, maybe an evals based tax for models which break/risk laws/regulations, etc.

Such things let smaller companies compete with ones which have more compute access by having another vector of profitability- safety and meeting regulation standards.

Evidence it's worked elsewhere:

Companies like Matomo can compete with Google Analytics pretty much entirely due to GDPR- it's why I pay for their service, rather than use GA's free service.

Regulations such as these will reduce invest in pure capabilities and increase it vastly in good safety research. It can also help set a clear difference between safety research that's useful and stuff that's just said by someone well connected and good at talking.

Risks:

It could run the risk of shifting research towards combating near term harms.
However, I think that if there are more people overall working in the AI Safety field, there is an increased chance of people being attracted to solving the problems needed to have a scalable alignment solution.
E.g. if the percentage of people in AI Safety working on solving x-risk situations goes from say, 70% to 20%, but the actual number of people in AI Safety rises from 300 to 30,000 (should be a lot more, but I'm being cautious), that would be an increase of 5700 people- a 1900% increase.We should do this.

0 comments

Comments sorted by top scores.

Proposal for improving state of alignment research

Contents

Goal:

Proposed Method:

Evidence it's worked elsewhere:

Risks:

0 comments