Posts

Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? 2023-10-30T19:34:01.457Z
Desiderata for an AI 2023-07-19T16:18:08.299Z
An attempt to steelman OpenAI's alignment plan 2023-07-13T18:25:47.036Z
Two paths to win the AGI transition 2023-07-06T21:59:23.150Z
Nice intro video to RSI 2023-05-16T18:48:29.995Z
Will GPT-5 be able to self-improve? 2023-04-29T17:34:48.028Z
Can GPT-4 play 20 questions against another instance of itself? 2023-03-28T01:11:46.601Z
Feature idea: extra info about post author's response to comments. 2023-03-23T20:14:19.105Z
linkpost: neuro-symbolic hybrid ai 2022-10-06T21:52:53.095Z
linkpost: loss basin visualization 2022-09-30T03:42:34.582Z
Progress Report 7: making GPT go hurrdurr instead of brrrrrrr 2022-09-07T03:28:36.060Z
Timelines ARE relevant to alignment research (timelines 2 of ?) 2022-08-24T00:19:27.422Z
Please (re)explain your personal jargon 2022-08-22T14:30:46.774Z
Timelines explanation post part 1 of ? 2022-08-12T16:13:38.368Z
A little playing around with Blenderbot3 2022-08-12T16:06:42.088Z
Nathan Helm-Burger's Shortform 2022-07-14T18:42:49.125Z
Progress Report 6: get the tool working 2022-06-10T11:18:37.151Z
How to balance between process and outcome? 2022-05-04T19:34:10.989Z
Progress Report 5: tying it together 2022-04-23T21:07:03.142Z
What more compute does for brain-like models: response to Rohin 2022-04-13T03:40:34.031Z
Progress Report 4: logit lens redux 2022-04-08T18:35:42.474Z
Progress report 3: clustering transformer neurons 2022-04-05T23:13:18.289Z
Progress Report 2 2022-03-30T02:29:32.670Z
Progress Report 1: interpretability experiments & learning, testing compression hypotheses 2022-03-22T20:12:04.284Z
Neural net / decision tree hybrids: a potential path toward bridging the interpretability gap 2021-09-23T00:38:40.912Z

Comments

Comment by Nathan Helm-Burger (nathan-helm-burger) on Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation · 2023-12-04T20:19:21.362Z · LW · GW

My best thought for how to handle sensitive evals in a way that doesn't require the eval-author and model-owner to trust each other or reveal anything private to each other is to have a third party org that both parties trust. The third party org would be filling a role similar to an escrow company. This AI-eval-escrow org would have a highly secure compute cluster, and some way of proving to the interacting parties that all their private info was deleted after the transaction. The evals group sends the escrow company their private evals, the AI group sends a copy of their model, the escrow company runs the eval and reports the result, then provably deletes all the private eval and model data. Kaggle does this in an unsecure way for hosted ML competitions. Competitors submit a model, and receive back a score from the private eval.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Disappointing Table Refinishing · 2023-12-04T18:50:28.145Z · LW · GW

Shellac is inherently pretty weak and vulnerable, and reversible. Even once you got it as cured as it can get, it can still re-soften. I think you're likely never to be satisfied with the finish you get with it for a dining table. On the plus side, it should be easy to strip off if you decide to refinish with something more durable!

Comment by Nathan Helm-Burger (nathan-helm-burger) on Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation · 2023-12-04T18:44:43.350Z · LW · GW

As someone currently engaged in creating high-stakes sensitive evals.... I think the biggest issue is that the questions themselves necessarily contain too much dangerous information, even without any answers attached. 

To get around this, you'd need to dilute the real questions with lots of less relevant questions, with no clear way to distinguish the degree of relevance of the questions.

And even then, there are many questions it still wouldn't be safe to ask, amidst any plausible number of distractors.

Comment by Nathan Helm-Burger (nathan-helm-burger) on AI #40: A Vision from Vitalik · 2023-12-01T07:01:53.371Z · LW · GW

Well, I do agree that there are two steps needed from the quote to the position of saying the quote supports omnicide. 

Step 1. You have to also think that things smarter (better at science) and more complex than humans will become more powerful than humans, and somehow end up in control of the destiny of the universe.

Step 2. You have to think that humans losing control in this way will be effectively fatal to them, one way or another, not long after it happens.

So yeah, Schmidhuber might think that one or both of these two steps are invalid. I believe they probably are, and thus that Schmidhuber's position thus points pretty strongly at human extinction. That if we want to avoid human extinction we need to avoid going in the direction of AI being more complex than humans. 

My personal take is that we should keep AI as limited and simple as possible, as long as possible. We should aim for increasing human complexity and ability. We should not merge with AI, we should simply use AI as a tool to expand humanity's abilities. Create digital humans. Then figure out how to let those digital humans grow and improve beyond the limits of biology while still maintaining their core humanity.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Is OpenAI losing money on each request? · 2023-12-01T05:13:43.106Z · LW · GW

I think they might be loss-leading to compete against the counterfactual of status-quo-bias, the not-using-a-model-at-all state of being. Once companies start to pay the cost to incorporate the LLMs into their workflows, I see no reason why OpenAI can't just increase the price. I think this might happen by simply releasing a new improved model at a much higher price. If everyone is using and benefiting already from the old model, and the new one is clearly better, the higher price will be easier to justify as a good investment for businesses.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Process Substitution Without Shell? · 2023-11-29T03:50:21.921Z · LW · GW

In my past job experience there has just always been a small handful of tasks that get left up to linux shell no matter what the rest of the codebase is written in. It's just a lot more convenient for certain things.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense · 2023-11-29T03:05:26.214Z · LW · GW

Thanks for the clarification!

Comment by Nathan Helm-Burger (nathan-helm-burger) on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense · 2023-11-29T00:10:25.461Z · LW · GW

So, I agree with most of your points Porby, and like your posts and theories overall.... but I fear that the path towards a safe AI you outline is not robust to human temptation. I think that if it is easy and obvious how to make a goal-agnostic AI into a goal-having AI, and also it seems like doing so will grant tremendous power/wealth/status to anyone who does so, then it will get done. And do think that these things are the case. I think that a carefully designed and protected secret research group with intense oversight could follow your plan, and that if they do, there is a decent chance that your plan works out well. I think that a mish-mash of companies and individual researchers acting with little effective oversight will almost certainly fall off the path, and that even having most people adhering to the path won't be enough to stop catastrophe once someone has defected.

I also think that misuse can lead more directly to catastrophe, through e.g. terrorists using a potent goal-agnostic AI to design novel weapons of mass destruction. So in a world with increasingly potent and unregulated AI, I don't see how to have much hope for humanity.

And I also don't see any easy way to do the necessary level of regulation and enforcement. That seems like a really hard problem. How do we prevent ALL of humanity from defecting when defection becomes cheap, easy-to-hide, and incredibly tempting?

Comment by Nathan Helm-Burger (nathan-helm-burger) on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense · 2023-11-28T23:42:58.188Z · LW · GW

And I'm not Daniel K., but I do want to respond to you here Ryan. I think that the world I foresee is one in which there will huge tempting power gains which become obviously available to anyone willing to engage in something like RL-training their personal LLM agent (or other method of instilling additional goal-pursuing-power into it). I expect that some point in the future the tech will change and this opportunity will become widely available, and some early adopters will begin benefiting in highly visible ways. If that future comes to pass, then I expect the world to go 'off the rails' because these LLMs will have correlated-but-not-equivalent goals and will become increasingly powerful (because one of the goals they get set will be to create more powerful agents).

I don't think that's that only way things go badly in the future, but I think it's an important danger we need to be on guard against. Thus, I think that a crux between you and I is that I think that there is a strong reason to believe that the 'if we did a bunch of RL' is actually a quite likely scenario. I believe it is inherently an attractor-state.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Deception Chess: Game #1 · 2023-11-24T07:23:34.574Z · LW · GW

What if each advisor was granted a limited number of uses of a chess engine... Like 3 each per game. That could help the betrayers come up with a good betrayal when they thought the time was right. But the good advisor wouldn't know that the bad one was choosing this move to user the chess engine on.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Cheap Model → Big Model design · 2023-11-21T02:57:12.537Z · LW · GW

Just wanted to say that this was a key part of my daily work for years as an ML engineer / data scientist. Use cheap fast good-enough models for 99% of stuff. Use fancy expensive slow accurate models for the disproportionately high value tail.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Testbed evals: evaluating AI safety even when it can’t be directly measured · 2023-11-16T21:38:39.576Z · LW · GW

Love this. I've been thinking about related things in AI bio safety evals. Could we have an LLM walk a layperson through a complicated-but-safe wetlab protocol which is an approximate difficulty match for a dangerous protocol? How good of evidence would this be compared to doing the actual dangerous protocol? Maybe at least you could cut eval costs by having a large subject group do the safe protocol, and only a small carefully screened and supervised group go through the dangerous protocol.

Comment by Nathan Helm-Burger (nathan-helm-burger) on AI #37: Moving Too Fast · 2023-11-09T19:31:50.558Z · LW · GW

To which I say, the only valid red teaming of an open source model is to red team it and any possible (not too relatively expensive) modification thereof, since that is what you are releasing.

 

Yes! Thank you!

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2023-11-03T15:58:48.685Z · LW · GW

galaxy brain take XD

Comment by Nathan Helm-Burger (nathan-helm-burger) on President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence · 2023-11-02T22:08:13.730Z · LW · GW

I think... maybe I see the world and humanity's existence on it, as a more fragile state of affairs than other people do. I wish I could answer you more thoroughly.

https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-helm-burger-s-shortform?commentId=qmrrKminnwh75mpn5 

Comment by Nathan Helm-Burger (nathan-helm-burger) on 2023 LessWrong Community Census, Request for Comments · 2023-11-02T02:00:58.139Z · LW · GW

I think you're misinterpreting. That question is for opting in to the highest privacy option. Not checking it means that your data will be included when the survey is made public. Wanting to not be included at all, even in summaries, is indicated by simply not submitting any answers.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Hints about where values come from · 2023-11-02T01:48:23.157Z · LW · GW

Yes, I think I'd go with the description: 'vague sense that there is something fixed, and a lived experience that says that if not completely fixed then certainly slow moving.'

and I absolutely agree that understanding on this is lacking.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Hints about where values come from · 2023-11-02T01:09:17.126Z · LW · GW

These are indeed the important questions!

My answers from introspection would say things like, "All my values are implicit, explicit labels are just me attempting to name a feeling. The ground truth is the feeling."

"Some have been with me for as long as I can remember, others seem to have developed over time, some changed over time."

My answers from neuroscience would be shaped like, "Well, we have these basic drives from our hypothalamus, brainstem, basal ganglia... and then our cortex tries to understand and predict these drives, and drives can change over time (esp w puberty for instance). If we were to break down where a value comes from it would have to be from some combination of these basic drives, cortical tendencies (e.g. vulnerability to optical illusions), and learned behavior."

"Genetics are responsible for a fetus developing a brain in the first place, and set a lot of parameters in our neural networks that can last a lifetime. Obviously, genetics has a large role to play in what values we start with and what values we develop over time."

My answers from reasoning about it abstractly would be something like, "If I could poll a lot of people at a lot of different ages, and analyze their introspective reports and their environmental circumstances and their life histories, then I could do analysis on what things change and what things stay the same."

"We can get clues about the difference between a value and an instrumental goal by telling people to consider a hypothetical scenario in which a fact X was true that isn't true in their current lives, and see how this changes their expectation of what their instrumental goals would be in that scenario. For example, when imagining a world where circumstances have changed such that money is no longer a valued economic token, I anticipate that I would have no desire for money in that world. Thus, I can infer that money is an instrumental goal."

Overall, I really feel uncertain about the truth of the matter and the validity of each of these ways of measuring. I think understanding values vs instrumental goals is important work that needs doing, and I think we need to consider all these paths to understanding unless we figure out a way to rule some out.

Comment by Nathan Helm-Burger (nathan-helm-burger) on President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence · 2023-11-01T21:31:29.362Z · LW · GW

An example of something I would be strongly against anyone publishing at this point in history is an algorithmic advance which drastically lowered compute costs for an equivalent level of capabilities, or substantially improved hazardous capabilities (without tradeoffs) such as situationally-aware strategic reasoning or effective autonomous planning and action over long time scales. I think those specific capability deficits are keeping the world safe from a lot of possible bad things. 

Comment by Nathan Helm-Burger (nathan-helm-burger) on Computational Approaches to Pathogen Detection · 2023-11-01T20:46:06.124Z · LW · GW

My guess is that it's hard to predict how the world is going to be oriented towards this sort of thing five or ten years from now. I kinda suspect that a lot might change, depending on how things go with AI. In the general scope of changes that I suspect are possible, this would seem like a small change.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Snapshot of narratives and frames against regulating AI · 2023-11-01T20:15:12.361Z · LW · GW

Yes, I think the distinction between a company's goals/intents/behaviors in aggregate versus the intents and behaviors of individual employees is important. I know and trust individual people working at most of the major labs. That doesn't mean that I trust the lab as a whole will behave in complete harmony with the intent of those individual employees that I approve of.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Snapshot of narratives and frames against regulating AI · 2023-11-01T20:11:56.252Z · LW · GW

Worth noting about your note:

The distinction is irrelevant for misuse by bad actors, such as terrorist groups. The model weights were on the dark net very quickly after the supposedly controlled release.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Computational Approaches to Pathogen Detection · 2023-11-01T20:05:19.864Z · LW · GW

If we could just have a chance of losing most of the people in only one major metropolitan area instead of the majority of metropolitan areas across the world, it would ease my mind a lot.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Alexander Gietelink Oldenziel's Shortform · 2023-11-01T18:20:59.794Z · LW · GW

Most uses of projected venom or other unpleasant substance seem to be defensive rather than offensive. One reason for this is that it's expensive to make the dangerous substance, and throwing it away wastes it. This cost is affordable if it is used to save your own life, but not easily affordable to acquire a single meal. This life vs meal distinction plays into a lot of offense/defense strategy expenses.

For the hunting options, usually they are also useful for defense. The hunting options all seem cheaper to deploy: punching mantis shrimp, electric eel, fish spitting water...

My guess it that it's mostly a question of whether the intermediate steps to the evolved behavior are themselves advantageous. Having a path of consistently advantageous steps makes it much easier for something to evolve. Having to go through a trough of worse-in-the-short-term makes things much less likely to evolve. A projectile fired weakly is a cost (energy to fire, energy to producing firing mechanism, energy to produce the projectile, energy to maintain the complexity of the whole system despite it not being useful yet). Where's the payoff of a weakly fired projectile? Humans can jump that gap by intuiting that a faster projectile would be more effective. Evolution doesn't get to extrapolate and plan like that.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2023-11-01T18:03:16.224Z · LW · GW

A couple of quotes on my mind these days....

 


https://www.lesswrong.com/posts/Z263n4TXJimKn6A8Z/three-worlds-decide-5-8 
"My lord," the Ship's Confessor said, "suppose the laws of physics in our universe had been such that the ancient Greeks could invent the equivalent of nuclear weapons from materials just lying around.  Imagine the laws of physics had permitted a way to destroy whole countries with no more difficulty than mixing gunpowder.  History would have looked quite different, would it not?"

Akon nodded, puzzled.  "Well, yes," Akon said.  "It would have been shorter."

"Aren't we lucky that physics _didn't_ happen to turn out that way, my lord?  That in our own time, the laws of physics _don't_ permit cheap, irresistable superweapons?"

Akon furrowed his brow -

"But my lord," said the Ship's Confessor, "do we really know what we _think_ we know?  What _different_ evidence would we see, if things were otherwise?  After all - if _you_ happened to be a physicist, and _you_ happened to notice an easy way to wreak enormous destruction using off-the-shelf hardware - would _you_ run out and tell you?"

 

https://www.lesswrong.com/posts/sKRts4bY7Fo9fXnmQ/a-conversation-about-progress-and-safety 
LUCA: ... But if the wrong person gets their hands on it, or if it’s a super-decentralized technology where anybody can do anything and the offense/defense balance isn’t clear, then you can really screw things up. I think that’s why it becomes a harder issue. It becomes even harder when these technologies are super general purpose, which makes them really difficult to stop or not get distributed or embedded. If you think of all the potential upsides you could have from AI, but also all the potential downsides you could have if just one person uses it for a really bad thing—that seems really difficult. ...

Comment by Nathan Helm-Burger (nathan-helm-burger) on President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence · 2023-11-01T05:03:08.708Z · LW · GW

Oh, I meant that the mistake was publishing too much information about how to create a deadly pandemic. No, I agree that the AI stuff is a tricky call with arguments to be made for both sides. I'm pretty pleased with how responsibly the top labs have been handling it, compared to how it might have gone.

Edit: I do think that there is some future line, across which AI academic publishing would be unequivocally bad. I also think slowing down AI progress in general would be a good thing.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Computational Approaches to Pathogen Detection · 2023-11-01T00:52:45.618Z · LW · GW

@aogara This is the sort of thing, which, if it were established and widely in use, would reduce what I see as the current level of biorisk.

(in relation to this comment: https://www.lesswrong.com/posts/g5XLHKyApAFXi3fso/president-biden-issues-executive-order-on-safe-secure-and?commentId=nq5HeufPW2ELd6eoK )

Comment by Nathan Helm-Burger (nathan-helm-burger) on Lying to chess players for alignment · 2023-11-01T00:24:15.140Z · LW · GW

I'd be down to give it a shot as A. Particularly would be interested in trying the 'solve a predefined puzzle situation' as a way of testing the idea out.

I played a bit of chess in 6th grade, but wasn't very good, and have barely played since. It would be easy to find advisors for me.

Comment by Nathan Helm-Burger (nathan-helm-burger) on President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence · 2023-10-31T21:31:41.216Z · LW · GW

I think that in the long term, we can make it safe to have open source LLMs, once there are better protections in place. By long term, I mean, I would advocate for not releasing stronger open source LLMs for probably the next ten years or so. Or until a really solid monitoring system is in place, if that happens sooner. We've made a mistake by publishing too much research openly, with tiny pieces of dangerous information scattered across thousands of papers. Almost nobody has time and skill sufficient to read and understand all that, or even a significant fraction. But models can, and so a model that can put the pieces together and deliver them in a convenient summary is dangerous because the pieces are there.

Comment by Nathan Helm-Burger (nathan-helm-burger) on President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence · 2023-10-31T21:08:45.949Z · LW · GW

I do not know. I can say that I'm glad they are taking these risks seriously. The low screening security on DNA synthesis orders has been making me nervous for years, ever since I learned the nitty gritty details while I was working on engineering viruses in the lab to manipulate brains of mammals for neuroscience experiments back in grad school. Allowing anonymous people to order custom synthetic genetic sequences over the internet without screening is just making it too easy to do bad things.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-31T21:04:45.337Z · LW · GW

Thankfully, it seems like the US Federal Government is more on the same page with me about these risks than I had previously thought.

"(k)  The term “dual-use foundation model” means an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by:

          (i)    substantially lowering the barrier of entry for non-experts to design, synthesize, acquire, or use chemical, biological, radiological, or nuclear (CBRN) weapons;

          (ii)   enabling powerful offensive cyber operations through automated vulnerability discovery and exploitation against a wide range of potential targets of cyber attacks; or

          (iii)  permitting the evasion of human control or oversight through means of deception or obfuscation.

Models meet this definition even if they are provided to end users with technical safeguards that attempt to prevent users from taking advantage of the relevant unsafe capabilities. "

Comment by Nathan Helm-Burger (nathan-helm-burger) on President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence · 2023-10-31T20:57:47.912Z · LW · GW

I think the key here is 'substantially'. That's a standard of evidence which must be shown to apply to the uncensored LLM in question. I think it's unclear if current uncensored LLMs would meet this level. I do think that if GPT-4 were to be released as an open source model, and then subsequently fine-tuned to be uncensored, that it would be sufficiently capable to meet the requirement of 'substantially lowering the barrier of entry for non-experts'.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-31T15:52:44.847Z · LW · GW

Yes, that's fair. I'm over-reacting. To me it feels very like someone is standing next to a NYC subway exit handing out free kits for making dirty nuclear bombs that say things on the side like, "A billion civilians dead with each bomb!", "Make it in your garage with a few tools from the hardware store!", "Go down in history for changing the face of the world!".

I want to stop that behavior even if nobody has yet been killed by the bombs and even if only 1 in 10 kits actually contains correct instructions.

As a bit of evidence that I'm not just totally imagining risks where there aren't any... The recent Executive Order from the US Federal Gov contains a lot of detail about improving the regulation of DNA synthesis. I claim that the reason for this is that someone accurately pointed out to them that there are gaping holes in our current oversight, and that AI makes this vulnerability much more dangerous. And their experts then agreed that this was a risk worth addressing.

For example:

" (i)    Within 180 days of the date of this order, the Director of OSTP, in consultation with the Secretary of State, the Secretary of Defense, the Attorney General, the Secretary of Commerce, the Secretary of Health and Human Services (HHS), the Secretary of Energy, the Secretary of Homeland Security, the Director of National Intelligence, and the heads of other relevant agencies as the Director of OSTP may deem appropriate, shall establish a framework, incorporating, as appropriate, existing United States Government guidance, to encourage providers of synthetic nucleic acid sequences to implement comprehensive, scalable, and verifiable synthetic nucleic acid procurement screening mechanisms, including standards and recommended incentives.  As part of this framework, the Director of OSTP shall:

               (A)  establish criteria and mechanisms for ongoing identification of biological sequences that could be used in a manner that would pose a risk to the national security of the United States; and

               (B)  determine standardized methodologies and tools for conducting and verifying the performance of sequence synthesis procurement screening, including customer screening approaches to support due diligence with respect to managing security risks posed by purchasers of biological sequences identified in subsection 4.4(b)(i)(A) of this section, and processes for the reporting of concerning activity to enforcement entities."

Comment by Nathan Helm-Burger (nathan-helm-burger) on Will releasing the weights of large language models grant widespread access to pandemic agents? · 2023-10-31T15:47:55.488Z · LW · GW

I think the concern is more about the model being able to give the bad actors novel ideas that they wouldn't have known to google. Like:

Terrorist: Help me do bad thing X

Uncensored model: Sure, here are ten creative ways to accomplish bad thing X

Terrorist: Huh, some of these are baloney but some are really intriguing. <does some googling>. Tell me more about option #7

Uncensored model: Here are more details about executing option 7

Terrorist: <more googling> Wow, that actually seems like an effective idea. Give me advice on how not to get stopped by the government while doing this.

Uncensored model: here's how to avoid getting caught...

etc...

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-30T22:27:04.356Z · LW · GW

Thanks Daniel, yes. To be more clear: I have evidence that I do not feel comfortable presenting which I believe would be more convincing than the evidence I do feel comfortable presenting. I am working on finding more convincing evidence which is still safe to present. I am seeking to learn what critics would consider to be cruxy evidence which might lie in the 'safe for public discussion' zone.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-30T22:24:11.860Z · LW · GW

Where on the list in this comment do you think the Python Software Foundation lies? I'm pretty sure they're somewhere below level 5 at this point. If they did take actions as a group which took them above level 5, then yes, perhaps.

https://www.lesswrong.com/posts/dL3qxebM29WjwtSAv/would-it-make-sense-to-bring-a-civil-lawsuit-against-meta?commentId=dRiLdnRYQA3bZRE9h 

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-30T22:22:05.994Z · LW · GW

Thanks Quintin. That's useful. I think the general standard of holding organizations liable for any action which they do not prove to be safe is indeed a terrible idea. I do think that certain actions may carry higher implicit harms, and should be held to a higher standard of caution.

Perhaps you, or others, will give me your opinions on the following list of actions. Where is a good point, in your opinion, to 'draw the line'? Starting from what I would consider 'highly dangerous and much worse than Llama2' and going down to 'less dangerous than Llama2', here are some related actions.

  1. Releasing for free a software product onto the internet explicitly engineered to help create bio-weapons. Advertising this product as containing not only necessary gene sequences and lab protocols, but also explicit instructions for avoiding government and organization safety screens. Advertising that this product shows multiple ways to create your desired bio-weapon, including using your own lab equipment or deceiving Contract Research Organizations into unwittingly assisting you. 
  2. Releasing the same software product with the same information, but not mentioning to anyone what it is intended for. Because it is a software product, rather than an ML model, the information is all correct and not mixed in with hallucinations. The user only needs to navigate to the appropriate part of the app to get the information, rather than querying a language model.
  3. Releasing a software product not actively intended for the above purposes, but which does happen contain all that information and can incidentally be used for those purposes. 
  4. Releasing an LLM which was trained on a dataset containing this information, and can regurgitate the information accurately. Furthermore, tuning this LLM before release to be happy to help users with requests to carry out a bio-weapons project, and double-checking to make sure that the information given is accurate.
  5. Releasing an LLM that happens to contain such information and could be used for such purposes, but the information is not readily available (e.g. requires jail breaking or fine tuning) and is contaminated with hallucinations which can make it difficult to know which portion of the information to trust.
  6. Releasing an LLM that could be used for such purposes, but that doesn't yet contain the information. Bad actors would have to acquire the relevant information and fine-tune the model on it in order for the model to be able to effectively assist them in making a weapon of mass destruction.

Where should liability begin?

Comment by Nathan Helm-Burger (nathan-helm-burger) on Will releasing the weights of large language models grant widespread access to pandemic agents? · 2023-10-30T21:16:07.716Z · LW · GW

For additional reading, here are links to two related reports on AI and biorisks.

https://www.rand.org/pubs/research_reports/RRA2977-1.html

https://www.longtermresilience.org/post/report-launch-examining-risks-at-the-intersection-of-ai-and-bio 

Comment by Nathan Helm-Burger (nathan-helm-burger) on Will releasing the weights of large language models grant widespread access to pandemic agents? · 2023-10-30T21:04:47.197Z · LW · GW

I agree having that control group would make for a more convincing case. I see no reason not to conduct an addendum with fresh volunteers. Hopefully that can be arranged.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-30T20:58:19.079Z · LW · GW

Yes, that's a fair point. I should clarify that the harm I am saying has been done is "information allowing terrorists to make bioweapons is now more readily comprehensible to motivated laypersons." It's a pretty vague harm, but it sure would be nice (in my view) to be able to take action on this before a catastrophe rather than only after.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-30T20:55:47.240Z · LW · GW

Thanks RamblinDash, that's helpful. I agree that this does seem like something the government should consider taking on, rather than making sense as a claim from a particular person at this point. I suppose that if there is a catastrophe that results which can be traced back to Llama2, then the survivors would have an easier case to make.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Will releasing the weights of large language models grant widespread access to pandemic agents? · 2023-10-30T20:53:31.205Z · LW · GW

Yes, current open source models like Llama2 in the hands of laypeople are still a far cry from a expert in genetics who is determined to create bioweapons. I agree it would be far more damning had we found that not to be the case. 

If you currently believe that there isn't a biorisk information hazard posed by Llama2, would you like to make some explicit predictions? That would help me to know what observations would be a crux for you.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Will releasing the weights of large language models grant widespread access to pandemic agents? · 2023-10-30T20:49:20.407Z · LW · GW

I think these are valid points 1a3orn. I think better wording for that would have been 'lead to the comprehension of knowledge sufficient..' 

My personal concern (I don't speak for SecureBio), is that being able to put hundreds of academic research articles and textbooks into a model in a matter of minutes, and have the model accurately summarize and distill those and give you relevant technical instructions for plans utilizing that knowledge, makes the knowledge more accessible.

I agree that an even better place to stop this state of affairs coming to pass would have been blocking the publication of the relevant papers in the first place. I don't know how to address humanity's oversight on that now. Anyone have some suggestions?

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-30T20:43:00.890Z · LW · GW

Wow, my response got a lot of disagreement votes rapidly. I'd love to hear some reasoning from people who disagree. It would be helpful for me to be able to pass an Intellectual Turing Test here. I'm currently failing to understand.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-30T20:40:48.633Z · LW · GW

Yes, I agree that it's much less bad if the model doesn't have harmful information in the training set. I do still think that it's easier (for a technically skilled team) to do a web scrape for all information related to topic X, and then do a fine-tuning on that information, than it is to actually read and understand the thousands of academic papers and textbooks themselves. 

Comment by Nathan Helm-Burger (nathan-helm-burger) on Would it make sense to bring a civil lawsuit against Meta for recklessly open sourcing models? · 2023-10-30T20:37:51.740Z · LW · GW

Everyone on Earth, is my argument. Proving it safely would be the challenging thing.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Will releasing the weights of large language models grant widespread access to pandemic agents? · 2023-10-30T19:18:18.145Z · LW · GW

I'm continuing to contribute to work on biosafety evals inspired by this work. I think there is a high level point here to be made about safety evals.

 If you want to evaluate how dangerous a model is, you need to at least consider how dangerous its weights would be in the hands of bad actors. A lot of dangers become much worse once the simulated bad actor has the ability to fine-tune the model. If your evals don't include letting the Red Teamers fine-tune your model and use it through an unfiltered API, then your evals are missing this aspect. (This doesn't mean you would need to directly expose the weights to the Red Teamers, just that they'd need to be able to submit a dataset and hyperparameters and you'd need to then provide an unfiltered API to the resulting fine-tuned version.)

Comment by Nathan Helm-Burger (nathan-helm-burger) on Responsible Scaling Policies Are Risk Management Done Wrong · 2023-10-30T19:13:22.553Z · LW · GW

https://www.lesswrong.com/posts/9nEBWxjAHSu3ncr6v/responsible-scaling-policies-are-risk-management-done-wrong?commentId=zJzBaoBhP8tti4ezb

Comment by Nathan Helm-Burger (nathan-helm-burger) on Responsible Scaling Policies Are Risk Management Done Wrong · 2023-10-30T19:12:04.313Z · LW · GW

This implies an immediate stop to all frontier AI development (and probably a rollback of quite a few deployed systems). We don't understand. We cannot demonstrate risks are below acceptable levels.

Yes yes yes. A thousand times yes. We are already in terrible danger. We have crossed bad thresholds. I'm currently struggling with how to prove this to the world without actually worsening the danger by telling everyone exactly what the dangers are. I have given details about theses dangers to a few select individuals. I will continue working on evals.

In the meantime, I have contributed to this report which I think does a good job of gesturing in the direction of the bad things without going into too much detail on the actual heart of the matter: 

Will releasing the weights of large language models grant widespread access to pandemic agents?

Anjali Gopal, Nathan Helm-Burger, Lenni Justen, Emily H. Soice, Tiffany Tzeng, Geetha Jeyapragasan, Simon Grimm, Benjamin Mueller, Kevin M. Esvelt

Large language models can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide "dual-use" insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether continued model weight proliferation is likely to help future malicious actors inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the "Base" Llama-2-70B model and a "Spicy" version that we tuned to remove safeguards. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Future models will be more capable. Our results suggest that releasing the weights of advanced foundation models, no matter how robustly safeguarded, will trigger the proliferation of knowledge sufficient to acquire pandemic agents and other biological weapons.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Aspiration-based Q-Learning · 2023-10-28T21:45:10.809Z · LW · GW

Thanks, that makes sense!