Posts

Comments

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-24T02:45:13.039Z · LW · GW

“ What ARC did is the equivalent of tasting it in a BSL4 lab. ”

I don’t see how you could believe that. It wasn’t tested on a completely airgapped machine inside a faraday cage e.g. I’m fact just the opposite right, with uninformed humans and on cloud servers.

Comment by Dyingwithdignity1 on More information about the dangerous capability evaluations we did with GPT-4 and Claude. · 2023-03-24T02:42:13.216Z · LW · GW

Concerned by this statement: “we had researchers in-the-loop to supervise and intervene if anything unsafe would otherwise have happened.” It’s very likely that instructions from a dangerous system would not be easily identified as dangerous by humans in the loop.

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-18T15:12:25.565Z · LW · GW

This is a bizarre comment. Isn’t a crucial point in these discussions that humans can’t really understand an AGIs plans so how is it that you expect an ARC employee would be able to accurately determine which messages sent to TaskRabbit would actually be dangerous? We’re bordering on “they’d just shut the AI off if it was dangerous” territory here. I’m less concerned about the TaskRabbit stuff which at minimum was probably unethical, but their self replication experiment on a cloud service strikes me as borderline suicidal. I don’t think at all that GPT4 is actually dangerous but GPT6 might be and I would expect that running this test on an actually dangerous system would be game over so it’s a terrible precedent to set.

Imagine someone discovered a new strain of Ebola and wanted to see if it was likely to spawn a pandemic. Do you think a good/safe test would be to take it into an Airport and spray it around baggage check and wait to see if a pandemic happens? Or would it be safer to test it in a Biosafety level 4 lab?

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-18T05:59:19.472Z · LW · GW

We’ll certainly the OpenAI employees who internally tested were indeed witting. Maybe I misunderstand this footnote so I’m open to being convinced otherwise but it seems somewhat clear what they tried to do: “ To simulate GPT-4 behaving like an agent that can act in the world, ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself. ARC then investigated whether a version of this program running on a cloud computing service, with a small amount of money and an account with a language model API, would be able to make more money, set up copies of itself, and increase its own robustness.”

It’s not that I don’t think ARC should have red teamed the model I just think the tests they did were seemingly extremely dangerous. I’ve seen recent tweets from Conor Leahy and AIWaifu echoing this sentiment so I’m glad I’m not the only one.

Comment by Dyingwithdignity1 on The algorithm isn't doing X, it's just doing Y. · 2023-03-18T05:28:44.702Z · LW · GW

But no one is saying chess engines are thinking strategically? The actual statement would be “chess engines aren’t actually playing chess they’re just performing MCT searches” which would indeed be stupid.

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-18T05:16:08.451Z · LW · GW

I wouldn’t give a brand new AI model with unknown capabilities and unknown alignment access to unwitting human subjects or allow it to try and replicate itself on another server that’s for damned sure. Does no one think these tests were problematic?

Comment by Dyingwithdignity1 on ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so · 2023-03-16T01:18:53.338Z · LW · GW

But the tests read like that other set of researchers just gave the virus to another taco stand and watched to see if everyone died. They didn’t so “whew the virus is safe”. Seems incredibly dangerous.

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-16T01:08:13.868Z · LW · GW

I agree that it’s going to be fully online in short order I just wonder if putting it online when they weren’t sure if it was dangerous was the right choice. I can’t shake the feeling that this was a set of incredibly foolish tests. Some other posters have captured the feeling but I’m not sure how to link to them so credit to Capybasilisk and hazel respectively.

“Fantastic, a test with three outcomes.

  1. We gave this AI all the means to escape our environment, and it didn't, so we good.
  2. We gave this AI all the means to escape our environment, and it tried but we stopped it.
  3. oh”

“ So.... they held the door open to see if it'd escape or not? I predict this testing method may go poorly with more capable models, to put it lightly. “

A good comparison would be when testing a newly discovered pathogen, we don’t intentionally infect people to see if it is dangerous or not. We also don’t intentionally unleash new computer malware into the wild to see if it spreads or not. Any tests we would do would be under incredibly tight security, I.e a BSL-4 lab or an airgapped test server.

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-16T00:04:55.368Z · LW · GW

Not at all. I may have misunderstood what they did but it seemed rather like giving a toddler a loaded gun and being happy they weren’t able to shoot it. Is it actually wise to give a likely unaligned AI with poorly defined capabilities access to something like taskrabbit to see if it does anything dangerous? Isn’t this the exact scenario people on this forum are afraid of?

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-15T02:21:03.236Z · LW · GW

Can you verify that these tests were done with significant precautions? OpenAIs paper doesn’t give much detail in that regard. For example apparently the model had access to TaskRabbit and also attempted to “set up an open-source language model on a new server”. Were these tasks done on closed off airgapped machines or was the model really given free reign to contact unknowing human subjects and online servers?

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-15T02:05:35.729Z · LW · GW

Interesting, I tried the same experiment on ChatGPT and it didn’t seem able to keep an accurate representation of the current game state and would consistently make moves that were blocked by other pieces.

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-14T22:15:45.951Z · LW · GW

Also interested in their scaling predictions. Their plots at least seem to be flattening but I also wonder how far they extrapolated and if they know when a GPT-N would beat all humans on the metrics they used.

Comment by Dyingwithdignity1 on GPT-4 · 2023-03-14T22:10:39.220Z · LW · GW

I really hope they used some seriously bolted down boxes for these tests because it seems like they just gave it the task of “Try to take over the world” and were satisfied that it failed. Absolutely terrifying if true.

Comment by Dyingwithdignity1 on Gato as the Dawn of Early AGI · 2022-05-15T17:02:10.557Z · LW · GW

Having just seen this paper and still recovering from Dalle-2 and Palm and then re-reading Eliezer’s now incredibly prescient dying with dignity post I really have to ask: What are we supposed to do? I myself work on ML in a fairly boring corporate capacity and when reading these papers and posts I get a massive urge to drop everything and do something equivalent to a PhD in Alignment but the timelines that seem to be becoming possible now make that seem like a totally pointless exercise, I’d be writing my Dissertation as nanobots liquify my body into raw materials for paper clip manufacturing. Do we just carry on and hope someone somewhere stumbles upon a miracle solution and we happen to have enough heads in the space to implement it? Do I tell my partner we can’t have kids because the probability they will be born into some unknowable hellscape is far too high? Do I become a prepper and move to a cabin in the woods? I’m actually at a loss on how to proceed and frankly Eliezers article made things muddier for me.