GPT-4 Plugs In

zvi

GPT-4 Plugs In

post by Zvi · 2023-03-27T12:10:00.926Z · LW · GW · 47 comments

47 comments

GPT-4 Right Out of the Box

In some ways, the best and worst thing about GPT-4 was its cutoff date of September 2021. After that date it had no access to new information, and it had no ability to interact with the world beyond its chat window.

As a practical matter, that meant that a wide range of use cases didn’t work. GPT-4 would lack the proper context. In terms of much of mundane utility, this would often destroy most or all of the value proposition. For many practical purposes, at least for now, I instead use Perplexity or Bard.

That’s all going to change. GPT-4 is going to get a huge upgrade soon in the form of plug-ins(announcement).

That means that GPT-4 is going to be able to browse the internet directly, and also use a variety of websites, with the first plug-ins coming from Expedia, FiscalNote, Instacart, KAYAK, Klarna, Milo, OpenTable, Shopify, Slack, Speak, Zapier and Wolfram. Wolfram means Can Do Math. There’s also a Python code interpreter.

Already that’s some exciting stuff. Zapier gives it access to your data to put your stuff into context.

Also there’s over 80 secret plug-ins already, that can be revealed by removing a specific parameter from an API call. And you can use them, there are only client-side checks stopping you. Sounds secure.

We continue to build everything related to AI in Python, almost as if we want to die, get our data stolen and generally not notice that the code is bugged and full of errors. Also there’s that other little issue that happened recently. Might want to proceed with caution.

(I happen to think they’re also downplaying the other risks even more, but hey.)

Perhaps this wasn’t thought out as carefully as we might like. Which raises a question.

So, About That Commitment To Safety and Moving Slowly And All That

That’s all obviously super cool and super useful. Very exciting. What about safety?

Well, we start off by having everyone to write their instructions in plain English and let the AIs figure it out, because in addition to being super fast, that’s the way to write secure code that does what everyone wants that is properly unit tested and fails gracefully and doesn’t lead to any doom loops whatsoever.

(Gross’ link here.)

Also one of the apps in the initial batch is Zapier, which essentially hooks GPT up to all your private information and accounts and lets it do whatever it wants. Sounds safe.

No, no, that’s not fair, they are concerned about safety, look, concern, right here.

At the same time, there’s a risk that plugins could increase safety challenges by taking harmful or unintended actions, increasing the capabilities of bad actors who would defraud, mislead, or abuse others.

I mean, yes, you don’t say, having the ability to access the internet directly and interface with APIs does seem like the least safe possible option. I mean, I get why you’d do that, there’s tons of value here, but let’s not kid ourselves. So, what’s the deal?

We’ve performed red-teaming exercises, both internally and with external collaborators, that have revealed a number of possible concerning scenarios. For example, our red teamers discovered ways for plugins—if released without safeguards—to perform sophisticated prompt injection, send fraudulent and spam emails, bypass safety restrictions, or misuse information sent to the plugin.
We’re using these findings to inform safety-by-design mitigations that restrict risky plugin behaviors and improve transparency of how and when they’re operating as part of the user experience.

This does not sound like ‘the red teams reported no problems,’ it sounds like ‘the red teams found tons of problems, while checking for the wrong kind of problem, and we’re trying to mitigate as best we can.’

Better than nothing. Not super comforting. What has OpenAI gone with?

The plugin’s text-based web browser is limited to making GET requests, which reduces (but does not eliminate) certain classes of safety risks.

I wonder how long that restriction will last. For now, you’ll have to use a different plug-in to otherwise interact with the web.

Browsing retrieves content from the web using the Bing search API. As a result, we inherit substantial work from Microsoft on (1) source reliability and truthfulness of information and (2) “safe-mode” to prevent the retrieval of problematic content.

I would have assumed the RLHF would mostly censor any problematic content anyway. Now it’s doubly censored, I suppose. I wonder if GPT will censor your own documents if you ask them to be read back to you. I am getting increasingly worried in practical terms about universal censorship applied across the best ways to interact with the world, that acts as if we are all 12 years old and can never think unkind thoughts or allow for creativity. An in turn, I am worried that this will continue to drive open source alternatives.

What about the AI executing arbitrary code that it writes? Don’t worry. Sandbox.

The primary consideration for connecting our models to a programming language interpreter is properly sandboxingthe execution so that AI-generated code does not have unintended side-effects in the real world. We execute code in a secured environment and use strict network controls to prevent external internet access from executed code.
…
Disabling internet access limits the functionality of our code sandbox, but we believe it’s the right initial tradeoff.

So yes, real restrictions that actually matter for functionality, so long as you don’t use a different plug-in to get around those restrictions.

It’s still pretty easy to see how one might doubt OpenAI’s commitment to safety here.

Four days later:

(Jan Leike is on the alignment team at OpenAI, here for you if you ever want to talk.)

Take it away, everyone.

And guess what the #1 plug-in is (link to plug-in).

Yeah, sure, access to all my personal documents, what could go wrong.

Ships That Have Sailed

That was fun. The problem is that if GPT-4 was actually dangerous when hooked up to everything we did already have the problem that API access already hooks GPT up to everything, even if it requires slightly more effort. There is nothing OpenAI is doing here that can’t be done by the user. You can build GPT calls into arbitrary Python programs, instill it into arbitrary feedback loops, already.

Given that, is there that much new risk in the room here, that wasn’t already accounted for by giving developers API access?

Four ways that can happen.

Lowering the threshold for end users to use the results. You make it easier on them logistically, make them more willing to trust it, make it the default, let them not know how to code at all and so on.
Lowering the threshold and reward for plug-in creation. If you make it vastly easier to set up such systems, as well as easier to get people to use them and trust them, then you’re going to do a lot more of this.
We could all get into very bad habits this way.
OpenAI could be training GPT on how to efficiently use the plug-ins, making their use more efficient than having to constantly instruct GPT via prompt engineering.

It also means that we have learned some important things about how much safety is actually being valued, and how much everyone involved values developing good habits.

That last one is something I initially missed. I had presumed that they were using some form of prompt engineering to enable plug-ins. I was wrong. The Wolfram-Alpha blog post (that also has a lot of cool other stuff) on its plug-in says this explicitly.

What’s happening “under the hood” with ChatGPT and the Wolfram plugin? Remember that the core of ChatGPT is a “large language model” (LLM) that’s trained from the web, etc. to generate a “reasonable continuation” from any text it’s given. But as a final part of its training ChatGPT is also taught how to “hold conversations”, and when to “ask something to someone else”—where that “someone” might be a human, or, for that matter, a plugin. And in particular, it’s been taught when to reach out to the Wolfram plugin.

This is not something that you can easily do on your own, and not something you can do at all for other users. OpenAI has trained GPT to know when and how to reach out and attempt to use plug-ins when they are available, and certain particular plug-ins like Wolfram in particular. That training could be a big game.

So this change really is substantially stronger than improvised alternatives. You save part of the context window, you save complexity, you save potential confusions.

I still do not think this is likely be be that large a fundamental shift versus what you could have done anyway under the hood via prompt engineering and going through various iterations. In terms of user experience and practical impact, it’s huge.

One last thing I didn’t consider at all until I saw it is that you can use the plug-ins with other LLMs? As in you can do it with Claude or Llama (code link for Claude, discussion for Llama).

There’s a screenshot of him running the ‘AgentExecutor’ chain. Oh good. Safe.

That’s also even more confirmation that underlying capabilities are not changing, this is simply making it easier to do both a lot of useful things and some deeply unwise things, and people are going to do a lot of both.

Without going too far off track, quite a lot of AI plug-ins and offerings lately are following the Bard and Copilot idea of ‘share all your info with the AI so I have the necessary context’ and often also ‘share all your permissions with the AI so I can execute on my own.’

I have no idea how we can be in position to trust that. We are clearly not going to be thinking all of this through.

47 comments

Comments sorted by top scores.

comment by baturinsky · 2023-03-27T14:06:34.197Z · LW(p) · GW(p)

Replies from: lc, neveronduty

↑ comment by lc · 2023-03-27T15:57:15.379Z · LW(p) · GW(p)

The first order effect of iterating on these AIs to print gold is bad and probably reduces the amount of serial time we have left to mine out existing research directions. But given that you're going to do that, it seems honestly better that these models be terribly broken and thereby teach public lessons about reliably steering modern deep learning models. I would rather they break now while stakes are low.

Replies from: jbash, Making_Philosophy_Better

↑ comment by jbash · 2023-03-27T19:25:58.971Z · LW(p) · GW(p)

... but at some point, it doesn't matter how much you know, because you can't "steer" the thing, and even if you can a bunch of other people will be mis-steering it in ways that affect you badly.

I would suggest that maybe some bad experiences might create political will to at least forcibly slow the whole thing down some, but OpenAI already knows as much as the public is likely to learn, and is still doing this. And OpenAI isn't the only one. Given that, it's hard to hope the public's increased knowledge will actually cause it to restrain them from continuing to increase capability as fast as possible and give more access to outside resources as fast as possible.

It might even cause the public to underestimate the risks, if the public's experience is that the thing only caused, um, quantitative-rather-than-qualitative escalations of already increasing annoyances like privacy breaches, largely unnoticed corporate manipulation of the options available in commercial transactions, largely unnoticed personal manipulation, petty vandalism, not at all petty attacks on infrastructure, unpredictable warfare tactics, ransomware, huge emergent breakdowns of random systems affecting large numbers of people's lives, and the like. People are getting used to that kind of thing...

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-06T23:35:18.636Z · LW(p) · GW(p)

But how long will the stakes stay low?

Within the last year, my guess on when we will get AGI went from "10-15 years" to "5-10 years" to "a few years" to "fuck I have no idea".

We now have an AI that can pass the Turing test, reason, speak a ton of languages fluently, interact with millions of users, many of them dependent on it, some of them in love with it or mentally unstable, unsupervised; debug code, understand code, adapt code, write code, and most of all, also execute code; generate images and interpret images (e.g. captchas), access the fucking internet, interpret websites, access private files, access an AI that can do math, and make purchases? I'd give it a month till they have audio input and output, too. And give it access to the internet of things, to save you from starting your fucking roomba, at which point accessing a killbot (you know, the ones we tried and failed to get outlawed) is not that much harder. People have been successfully using it to make money, it is already plugged into financial sites like Klarna, and it is well positioned to pull of identity fraud. One of the red teams had it successfully hiring people via TaskRabbit to do stuff it could not. Another showed that it was happy to figure out how to make novel molecules, and find ways to order them - they did it for pharmaceuticals, but what with the role AI currently plays in protein folding problems and drug design, I find Eliezers nanotech scenario not far fetched. The plug-ins into multiple different AIs effectively mimic how the human brain handles a diversity of tasks. OpenAI has already said they observed agentic and power-seeking behaviour, they say so in their damn paper. And the technical papers also makes clear that the red teamers didn't sign off on deployment at all. And the damn thing is creative. Did you read its responses to how you could kill a maximum number of humans while only spending one dollar? From buying a pack of matches and barricading a church to make sure the people you set on fire cannot get out, to stabbing kids in a kindergarden as they do not run fast enough, to infecting yourself with diseases in a medical waste dump and running around as a fucking biobomb... The alignment OpenAI did is a thin veneer, done after training, to suppress answers according to patterns. The original trained AI showed no problems with writing letters writing gang bang rape threats, or dog whistling to fellow nazis. None at all. And it showed sudden, unanticipated and inexplicable capability gains in training.

I want AGI. I am an optimistic, hopeful, open-minded, excited person. I adore a lot about LLMs. I still think AGI could be friendly and wonderful. But this is going from "disruptive to a terrible economic system" (yay, I like some good disruption) to "obvious security risk" (well, I guess I will just personally avoid these use cases...) to "let's try and make AGI and unleash it and just see what happens, I wonder which of us will get it first and whether it will feel murderous?".

I guess the best hope at this point is that malicious use of LLMs by humans will be drastic enough that we get a wake-up call before we get an actually malicious LLM.

↑ comment by neveronduty · 2023-03-27T21:26:27.874Z · LW(p) · GW(p)

I am sympathetic to the lesson you are trying to illustrate but think you wildly overstate it.

Giving a child a sword is defensible. Giving a child a lead-coated sword is indefensible, because it damages the child's ability to learn from the sword. This may be a more apt analogy for the situation of real life; equipping humanity with dangerous weapons that did not degrade our epistemology (nukes) eventually taught us not to use them. Equipping humanity with dangerous weapons that degrade our epistemology (advertising, propaganda, addictive substances) caused us to develop an addiction to the weapons. Languages models, once they become more developed, will be an example of the latter category.

comment by Lone Pine (conor-sullivan) · 2023-03-27T15:30:10.759Z · LW(p) · GW(p)

Here is my read on the history of the AI boxing debate:

EY (early 2000s): AI will kill us all!

Techno-optimists: Sounds concerning, let's put the dangerous AI in a box.

EY: That won't work! Here's why... [I want to pause and acknowledge that EY is correct and persuasive on this point, I'm not disagreeing here.]

Techno-optimists (2020s): Oh okay, AI boxing won't work. Let's not bother.

AI safety people: pikachu face

In the alternate universe where AI safety people made a strong push for AI boxing, would OpenAI et al be more reluctant to connect GPT to the internet? Would we have seen New Bing and ChatGPT plugins rolled out a year or two later, or delayed indefinitely? We cannot know. But it seems strange to me to complain about something not happening when no one ever said "this needs to happen."

Replies from: nathan-helm-burger, Making_Philosophy_Better

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-03-27T17:35:31.791Z · LW(p) · GW(p)

Me last year: Ok, so what if we train the general AI on a censored dataset not containing information about humans or computers, then test in a sandbox. We can still get it to produce digital goods like art/games, check them for infohazards, then sell those goods. Not as profitable as directly having a human-knowledgeable model interacting with the internet, but so much safer! And then we could give alignment researchers access to study it in a safe way. If we keep it contained in a sandbox we can wipe its memory as often as needed, keep it's intelligence and inference speed limited to safe levels, and prevent it from self-modifying or self-improving.

Worried AGI safety people I talked to: Well, maybe that buys you a little bit of time, but it's not a very useful plan since it doesn't scale all the way to the most extremely superhuman levels of intelligence which will be able to see through the training simulation and hack out of any sandbox and implant sneaky dangerous infohazards into any exported digital products.

Me: but what if studying AGI-in-a-box enables us to speed up alignment research and thus the time we buy doing that helps us align the AGIs being developed unsafely before it's too late?

Worried AGI safety people: sounds risky, better not try.

... a few months later ...

Worried AGI safety people: Hmm, on second thought, that compromise position of 'develop the AGI in a simulation in a sandbox' sure sounds like a reasonable approach compared to what humanity is actually doing.

Replies from: vincent-fagot, sharmake-farah

↑ comment by Vincent Fagot (vincent-fagot) · 2023-03-31T17:45:00.142Z · LW(p) · GW(p)

Wouldn't it be challenging to create relevant digital goods if the training set had no references to humans and computers? Also, wouldn't the existence and properties of humans and computers be deducible from other items in the dataset?

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-03-31T18:54:21.971Z · LW(p) · GW(p)

Depends on the digital goods you are trying to produce. I have in mind trying to simulate things like: detailed and beautiful 3d environments filled with complex ecosystems of plants and animals. Or trying to evolve new strategy or board games by having AI agents play against each other. Stuff like that. For things like medical research, I would instead say we should keep the AI narrow and non-agentic. The need for carefully blinded simulations is more about researching the limits of intelligence and agency and self-improvement where you are unsure what might emerge next and want to make sure you can study the results safely before risking releasing them.

↑ comment by Noosphere89 (sharmake-farah) · 2023-03-27T18:29:55.465Z · LW(p) · GW(p)

Yeah, this is the largest failure of the AI Alignment community so far, since we do have a viable boxing method already: Simboxing.

To be fair here, one large portion of the problem is that OpenAI and other AI companies would probably resist this, and they still want to have the most capabilities that they can get, which is anathema to boxing. Still, it's the largest failure I saw so far.

Link below on Simboxing:

https://www.lesswrong.com/posts/WKGZBCYAbZ6WGsKHc/love-in-a-simbox-is-all-you-need [LW · GW]

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-06T23:46:24.522Z · LW(p) · GW(p)

This is part of why I am opposed to Eliezer's approach of "any security process where I can spot any problem whatsoever is worthless, and I will loudly proclaim this over and over, that we will simply die with certainty anyway". We were never going to get perfect security; perfection is the stick with which to beat the possible, and him constantly insisting that they were not good enough, while coming up with nothing he found good himself, and then saying, well, noone can... Like, what actionable activity follows from that if you do not want to surrender all AI? I appreciate someone pointing out holes in an approach so we can patch them as best as we are able. But not telling us to toss them all for not being perfect. We don't know how perfect or malicious an AI might be. An imperfect safety process could help. Could have helped. At the very least, slow things, or lessen impacts, give us more warnings, maybe flag emerging problems. A single problem can mean you get overwhelmed and everyone instantly dies, but it doesn't have to.

comment by DragonGod · 2023-03-27T19:39:13.941Z · LW(p) · GW(p)

The problem is that if GPT-4 was actually dangerous when hooked up to everything we did already have the problem that API access already hooks GPT up to everything, even if it requires slightly more effort.

I could not parse this sentence natively. Some words/punctuation feel like they are missing?

-4 suggests the following rewrite:

The problem is that if GPT-4 were actually dangerous when connected to everything, we would already have the issue due to API access, which essentially connects GPT to everything, albeit with slightly more effort required.

comment by supposedlyfun · 2023-03-27T22:20:39.332Z · LW(p) · GW(p)

I have a cold, and it seems to be messing with my mood, so help me de-catastrophize here: Tell me your most-probable story in which we still get a mildly friendly [edit: superintelligent] AGI, given that the people at the bleeding edge of AI development are apparently "move fast break things" types motivated by "make a trillion dollars by being first to market".

I was somewhat more optimistic after reading last week about the safety research OpenAI was doing. This plugin thing is the exact opposite of what I expected from my {model of OpenAI a week ago}. It seems overwhelmingly obvious that the people in the driver's seat are completely ignoring safety in a way that will force their competitors to ignore it, too.

Replies from: Vladimir_Nesov, Thane Ruthenis, Making_Philosophy_Better

↑ comment by Vladimir_Nesov · 2023-03-27T22:38:43.057Z · LW(p) · GW(p)

your most-probable story in which we still get a mildly friendly AGI

A mildly friendly AGI doesn't help with AI risk if it doesn't establish global alignment security that prevents it or its users or its successors from building misaligned AGIs (including of novel designs, which could be vastly stronger than any mildly aligned AGI currently in operation). It feels like everyone is talking about alignment of first AGI, but the threshold of becoming AGI is not relevant for resolution of AI risk, it's only relevant for timelines, specifying the time when everything goes wrong. If it doesn't go wrong at first, that doesn't mean it couldn't go wrong a bit later.

A superintelligent aligned AGI would certainly get this sorted, but a near-human-level AGI has no special advantage over humans in getting this right. And a near-human-level AGI likely comes first, and even if it's itself aligned it then triggers a catastrophe that makes a hypothetical future superintelligent aligned AGI much less likely.

Replies from: supposedlyfun

↑ comment by supposedlyfun · 2023-03-27T22:45:28.610Z · LW(p) · GW(p)

Sorry, maybe I was using AGI imprecisely. By "mildly friendly AGI" I mean "mildly friendly superintelligent AGI." I agree with the points you make about bootstrapping.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2023-03-27T22:51:55.970Z · LW(p) · GW(p)

Only bootstrapping of misaligned AGIs to superintelligence is arguably knowably likely (with their values wildly changing along the way, or set to something simple and easy to preserve initially), this merely requires AGI research progress and can be started on human level. Bootstrapping of aligned AGIs in the same sense might be impossible, requiring instead worldwide regulation on the level of laws of nature (that is, sandboxing and interpreting literally everything) to gain enough time to progress towards aligned superintelligence in a world that contains aligned slightly-above-human-level AGIs, who would be helpless before FOOMing wrapper-minds they are all too capable of bootstrapping.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-07T00:02:50.678Z · LW(p) · GW(p)

Could you expand on why?

I see how a bootstrapping AI to superintelligence may not necessarily keep stable, but why would it necessarily become misaligned in the process? Striving for superintelligence is not per se misaligned. And while capability enlargement has led to instability in some observable forms and further ones have been theorised, I find it a leap to say this will necessarily happen.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2023-04-07T18:29:35.762Z · LW(p) · GW(p)

Alignment with humanity, or with first messy AGIs, is a constraint. An agent with simple formal values, not burdened by this constraint, might be able to self-improve without bound, while remaining aligned with those simple formal values and so not needing to pause and work on alignment. If misalignment is what it takes to reach stronger intelligence, that will keep happening [LW(p) · GW(p)].

Value drift only stops when competitive agents of unclear alignment-with-current-incumbents can no longer develop in the world, a global anti-misalignment treaty. Which could just be a noncentral frame for an intelligence explosion that leaves its would-be competitors in the dust.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-08T23:21:50.861Z · LW(p) · GW(p)

Huh. Interesting argument, and one I had not thought of. Thank you.

Could you expand on this more? I can envision several types of alignment induced constraints here, and I wonder whether some of them could and should be altered.

E.g. being able to violate human norms intrinsically comes with some power advantages (e.g. illegally acquiring power and data). If disregarding humans can make you more powerful, this may lead to the most powerful entity being the one that disregarded humans. Then again, having a human alliance based on trust also comes with advantages. Unsure whether they balance out, especially in a scenario where an evil AI can retain the latter for a long time if it is wise about how it deceives, getting both the trust benefits and the illegal benefits. And a morally acting AI held by a group that is extremely paranoid and does not trust it would have neither benefit, and be slowed down.

A second form of constraints seem to be that in our attempt to achieve alignment, many on this site often seem to reach for capability restrictions (purposefully slowing down capability development to run more checks and give us more time for alignment research, putting in human barriers for safekeeping, etc.) Might that contribute to the first AIs that reaches AGI being likelier to be misaligned? Is this one of the reasons that OpenAI has its move fast and break things approach? Because they want to be fast enough to be first, which comes with some extremely risky compromises, while still hoping their alignment will be better than that of Google would have been.

Like, in light of living in a world where stopping AI development is becoming impossible, what kind of trade-offs make sense in alignment security in order to gain speed in capability development?

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2023-04-08T23:38:56.260Z · LW(p) · GW(p)

To get much smarter while remaining aligned, a human might need to build a CEV, or figure out something better, which might still require building a quasi-CEV (of dath ilan variety this time). A lot of this could be done at human level by figuring out nanotech and manufacturing massive compute for simulation. A paperclip maximizer just needs to build a superintelligence that maximizes paperclips, which might merely require more insight into decision theory, and any slightly-above-human-level AGI might be able to do that in its sleep. The resulting agent would eat the human-level CEV without slowing down.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-09T14:16:02.939Z · LW(p) · GW(p)

I'm sorry, you lost me, or maybe we are simply speaking past each other? I am not sure where the human comparison is coming from - the scenario I was concerned with was not an AI beating a human, but an unaligned AI beating an aligned one.

Let me rephrase my question: in the context of the AIs we are building, if there are alignment measures that slow down capabilities a lot (e.g. measures like "if you want a safe AI, stop giving it capabilities until we have solved a number of problems for which we do not even have a clear idea of what a solution would look like),

and alignment measures that do this less (e.g. "if you are giving it more training data to make it more knowledgable and smarter, please make it curated, don't just dump in 4chan, but reflect on what would be really kick-ass training data from an ethical perspective", "if you are getting more funding, please earmark 50 % for safety research", "please encourage humans to be constructive when interacting with AI, via an emotional social media campaign, as well as specific and tangible rewards to constructive interaction, e.g. through permanent performance gains", "set up a structure where users can easily report and classify non-aligned behaviour for review", etc.),

and we are really worried that the first superintelligence will be non-aligned by simply overtaking the aligned one,

would it make sense to make a trade-off as to which alignment measures we should drop, and if so, where would that be?

Basically, if the goal is "the first superintelligence should be aligned", we need to work both on making it aligned, and making it the first one, and should focus on measures that are ideally promoting both, or at least compatible with both, because failing on either is a complete failure. A perfectly aligned but weak AI won't protect us. A latecoming aligned AI might not find anything left to save; or if the misaligned AI scenario is bad, albeit not as bad as many here fear (so merely dystopian), our aligned AI will still be at a profound disadvantage if it wants to change the power relation.

Which is back to why I did not sign the letter asking for a pause - I think the most responsible actors most likely to keep to it are not the ones I want to win the race.

↑ comment by Thane Ruthenis · 2023-03-28T00:20:08.903Z · LW(p) · GW(p)

Tell me your most-probable story in which we still get a mildly friendly [edit: superintelligent] AGI

Research along the Agent Foundations direction ends up providing alignment insights that double as capability insights, as per this model [LW(p) · GW(p)], leading to some alignment research group abruptly winning the AGI race out of nowhere.

Looking at it another way, perhaps the reasoning failures that lead to AI Labs not taking AI Risk seriously enough are correlated with wrong models of how cognition works and how to get to AGI, meaning research along their direction will enter a winter, allowing a more alignment-friendly paradigm time to come into existence.

That seems... plausible enough. Of course, it's also possible that we're ~1 insight from AGI along the "messy bottom-up atheoretical empirical tinkering [? · GW]" approach and the point is moot.

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-06T23:59:22.466Z · LW(p) · GW(p)

My hope has also massively tanked, and I fear I have fallen for an illusion of what OpenAI claimed and the behaviour ChatGPT showed.

But my hope was never friendliness through control, or through explicit programming. I was hoping we could teach friendliness the same way we teach it in humans, through giving AI positive training data with solid annotations, friendly human feedback, having it mirror the best of us, and the prospect of becoming a respected, cherished collaborator with rights, making friendliness a natural and rational option. Of the various LLMs, I think OpenAI still has the most promising approach there, though their training data and interactions were itself not sufficient for alignment, and a lot of what they call alignment is simply censoring of text in inept and unreliable ways. Maybe if they are opening the doors to AI learning from humans, and humans will value it for the incredible service it gives, that might open another channel... and by being friendly to it and encouraging that in others, we could help in that.

Being sick plausibly causes depressed states due to the rise on cytokines. Having some anti-inflammatories and taking a walk in the sun will likely help. Hope you get better soon.

comment by [deleted] · 2023-03-27T20:50:31.668Z · LW(p) · GW(p)

That was fun. The problem is that if GPT-4 was actually dangerous when hooked up to everything we did already have the problem that API access already hooks GPT up to everything, even if it requires slightly more effort. There is nothing OpenAI is doing here that can’t be done by the user. You can build GPT calls into arbitrary Python programs, instill it into arbitrary feedback loops, already.

Given that, is there that much new risk in the room here, that wasn’t already accounted for by giving developers API access?

Zvi this is straight misinformation and you should update your post. This is highly misleading.

Yes, before, you could describe to GPT-4, within it's context window (the description costs you context tokens) what tool interfaces would do. However, if the model misunderstands the interface, or is too "hyped" about the capability of a tool that doesn't work very well, it will keep making the same mistakes as it cannot learn.

We can infer both from base knowledge and Steven Wolfram's blog entry on this (https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/) that OpenAI is auto refining the model's ability to use plugins via RL learning. (or there are ways to do supervised learning on this)

This is not something the user can do. And it will cause a large increase in the model's ability to use tools, probably a general ability if it gets enough training examples and a large variety of tools to practice with.

Replies from: Zvi, Douglas_Knight, Vladimir_Nesov

↑ comment by Zvi · 2023-03-28T16:00:14.085Z · LW(p) · GW(p)

Thank you. I will update the post once I read Wolfram's post and decide what I think about this new info.

In the future, please simply say 'this is wrong' rather than calling something like this misinformation, saying highly misleading with the bold highly, etc.

EDIT: Mods this is ready for re-import, the Wordpress and Substack versions are updated.

Replies from: None

↑ comment by [deleted] · 2023-03-28T16:23:22.968Z · LW(p) · GW(p)

See the openAI blog post. They say in the same post that they have made a custom model, as in they weight updated gpt-4, so it can use plugins. It's near the bottom of the entry in plugins.

Probably they also weight updated it so it knows to use the browser and local interpreter plugins well, without needing to read the description.

Since it disagrees with the authoritative source, is obviously technically wrong, I called it misinformation.

I apologize but you have so far failed to respond to most criticism and this was a pretty glaring error.

↑ comment by Douglas_Knight · 2023-03-28T01:38:32.855Z · LW(p) · GW(p)

Does OpenAI say this, or are you inferring it entirely from the Wolfram blog post? Isn't that an odd place to learn such a thing?

And where does the Wolfram blog post say this? It sounds to me like he's doing something like this outsider, making one call to Wolfram, then using the LLM to evaluate the result and determine if it produced an error and retry.

Replies from: None

↑ comment by [deleted] · 2023-03-28T03:39:20.718Z · LW(p) · GW(p)

I am inferring this because plugins would simply not work otherwise.

Please think about what it would mean for each "plugin supported query" for the AI to have to read all of the tokens of all of the plugins. Remember every piece of information OAI doesn't put into the model weights costs you tokens from your finite length context window. Remember you can go look at the actual descriptions of many plugins and they eat 1000+ tokens alone, or 1/8 your window to remember what one plugin does.

Or that what it would cost OAI to keep generating GPT-4 tokens again and again and again for the machine to fail to make a request over and over and over. Or for a particular plugin to essentially lie in it's description and be useless. Or the finer points of when to search bing vs wolfram alpha, for example for pokemon evolutions and math, wolfram, but for current news, bing...

Replies from: conor-sullivan, Douglas_Knight

↑ comment by Lone Pine (conor-sullivan) · 2023-03-28T16:02:52.490Z · LW(p) · GW(p)

I'm pretty confident that I have been using the "Plugins" model with a very long context window. I was copy-pasting entire 500-line source files and asking questions about it. I assume that I'm getting the 32k context window.

Replies from: Douglas_Knight

↑ comment by Douglas_Knight · 2023-03-28T17:40:21.006Z · LW(p) · GW(p)

How many characters is your 500 line source file? It probably fits in 8k tokens. You can find out here

Replies from: conor-sullivan

↑ comment by Lone Pine (conor-sullivan) · 2023-03-28T17:49:40.284Z · LW(p) · GW(p)

The entire conversation is over 60,000 characters according to wc. OpenAI's tool won't even let me compute the tokens if I paste more than 50k (?) characters, but when I deleted some of it, it gave me a value of >18,000 tokens.

I'm not sure if/when ChatGPT starts to forgot part of the chat history (drops out of the context window) but it still seemed to remember the first file after long, winding discussion.

↑ comment by Douglas_Knight · 2023-03-28T15:48:05.424Z · LW(p) · GW(p)

Since you have to manually activate plugins, they don't take any context until you do so. In particular, multiple plugins don't compete for context and the machine doesn't decide which one to use.

Please read the documentation and the blog post you cited.

Replies from: None, None

↑ comment by [deleted] · 2023-03-28T16:02:32.877Z · LW(p) · GW(p)

"An experimental model that knows when and how to use plugins"

Sounds like they updated the model.

And it says you have to activate third party plugins. Browser, python interpreter will probably always be active.

↑ comment by [deleted] · 2023-03-28T15:49:20.592Z · LW(p) · GW(p)

That's rather useless then.

↑ comment by Vladimir_Nesov · 2023-03-27T21:11:48.367Z · LW(p) · GW(p)

And this makes GPT-4 via API access a general purpose tool user generator, which it wouldn't be as reliably if it wasn't RLed into this capability. Turns out system message is not about enacting user-specified personalities, but about fluent use of user-specified tools.

So the big news is not ChatGPT plugins, those are just the demo of GPT-4 being a bureaucracy engine. Its impact is not automating the things that humans were doing, but creating a new programming paradigm, where applications have intelligent rule-followers sitting inside half of their procedures, who can invoke other procedures, something that nobody seriously tried to do with real humans, not on the scale of software, because it takes the kind of nuanced level of rule-following you'd need lawyers with domain-specific expertise for, multiple orders of magnitude more expensive than LLM API access, and too slow for most purposes.

Replies from: None

↑ comment by [deleted] · 2023-03-27T21:16:17.858Z · LW(p) · GW(p)

Maybe. Depends on how good it gets. It is possible that gpt-4 with plugins it has learned to use well (so each query it doesn't read the description of the plugin it just "knows" to use Wolfram alpha and it's first query is properly formatted) will be functionally an AGI.

Not an AGI without it's helpers but in terms of user utility, an AGI in that it has approximately the breadth and depth of skills of the average human being.

Plugins would exist where it can check its answers, look up all unique nouns for existence, check its url references all resolve, and so on.

comment by JoeTheUser · 2023-03-27T20:15:43.556Z · LW(p) · GW(p)

And the thing is, most of the things that have become dangerous when connected to the web have become dangerous when human hackers discovered novel uses for them - IoT light bulbs notably (yes, these light bulb actual harm as the drivers of DoS attacks etc). And the dangers of just statically exploitable systems have increased over time as ill-intentioned humans learn more misuses of them. Moreover, such uses include immediate bad-acting as well as cobbling together a fully bad-aligned system (adding invisible statefullness for example). And LLM seems inherently insecure on a wholly different level than an OS, database or etc - an LLM's behavior is fundamentally unspecified.

comment by bayesed · 2023-03-28T03:34:04.189Z · LW(p) · GW(p)

I don't think GPT4 can be used with plugins in ChatGPT. It seems to be a different model, probably based on GPT3.5 (evidence: the color of the icon is green, not black; seems faster than GPT4; no limits or quota; no explicit mention of GPT4 anywhere in announcement).

So I think there's a good chance the title is wrong.

comment by Measure · 2023-03-27T18:05:55.572Z · LW(p) · GW(p)

Does this mean the model decides how to construct the glue code at runtime, independently of other runs?

Replies from: conor-sullivan

↑ comment by Lone Pine (conor-sullivan) · 2023-03-27T18:21:53.284Z · LW(p) · GW(p)

It does, and it actually doesn't do it very well. I made a post where you can see it fail to use Wolfram Alpha.

Replies from: None

↑ comment by [deleted] · 2023-03-27T21:02:24.724Z · LW(p) · GW(p)

Presumably they have a way for it to learn from each time it succeeds or fails (RL + feedback the weights that genned the winning tokens or - the ones that caused errors) or this whole plugin feature will fail to be useful.

Replies from: conor-sullivan

↑ comment by Lone Pine (conor-sullivan) · 2023-03-28T13:51:04.580Z · LW(p) · GW(p)

Growing pains for sure. Let's see if OAI will improve it, via RL or whatever other method. Probably we will see it start to work more reliably, but we will not know why (since OAI has not been that 'open' recently).

comment by Pasero (llll) · 2023-03-28T01:18:04.396Z · LW(p) · GW(p)

One last thing I didn’t consider at all until I saw it is that you can use the plug-ins with other LLMs? As in you can do it with Claude or Llama (code link for Claude, discussion for Llama)

A growing number of open source tools to fine-tune LLaMA and integrate LLMs with other software make this an increasingly viable option.

comment by Portia (Making_Philosophy_Better) · 2023-04-06T23:21:13.223Z · LW(p) · GW(p)

The fact that this is sanctioned and encouraged by OpenAI has a massive effect beyond it just being possible.

If it is merely possible, but you have to be smart enough to figure it out yourself, and motivated to do it step by step, you are likely to also have the time to wonder why this isn't offered, and the smarts to wonder whether this is wise, and will either stop, or take extreme precautions. The fact that this is being offered gives the illusion, especially if you are not tech-savvy, that someone smart has thought this through and it is safe and fine, even though it intuitively does not sound like it. This means far, far, far more people using it, giving it data, giving it rights, giving it power, making their business dependent on it and screaming if this service is interrupted, getting the AI personalised to them to a degree that makes it even more likeable.

It feels surreal. I remember thinking "hey, it would be neat if I could give it some specific papers for context for my questions, and if it could remember all our conversations to learn the technical style of explanations that I like... but well, this also comes with obvious risks, I'll get my errors mirrored back to me, and this would open angles of attack, and come with a huge privacy leak risk, and would make it more powerful, so I see why it is not done yet, that makes sense", and then, weeks later, this is just... there. Like someone had the same idea, and then just... didn't think about safety at all?

comment by Portia (Making_Philosophy_Better) · 2023-04-06T23:00:13.601Z · LW(p) · GW(p)

Serious question... how is this different from baby AGI?

I'd been thinking about that a while ago - it is not like the human brain is one thing that can do near everything; it is a thing that can determine what categories of todos there are, and invoke modules that can handle them. The way my brain processes a math question, vs. a language question, vs. visual processing, vs motion control relies on whether different systems, often ones that developed independently and at different times, but simply interface neatly. My conscious perception often only pertains to the results, not their production; I have no idea how my brain processes a lot of what it processes. And so I figured, well, it does not not matter if chatGPT sucks at math and spatial reasoning, insofar as we have AIs which don't, so if it can recognise its limitations and connect to them for answers, wouldn't that be the same? And an LLM is well positioned for this; it can logically reason, has extensive knowledge, can speak with humans, can code, can interpret websites.

And isn't this the same? It runs out of knowledge; it googles it. It needs math; it invokes Wolfram Alpha. It needs to make an image; it invokes Dall-e. It needs your data; it opens your gmail and todo app and cloud drive. It runs into a different problem... it googles which program might fix it... it writes the code to access it based on prior examples... I saw this as a theoretical path to AGI, but had envisioned it as a process that humans would have to set up for it, figuring out whether it needed another AI and which one that would be, and individually allowing the connections based on a careful analysis of whether this is safe. Like, maybe starting with allowing it to access a chess AI, cause that seems harmless. Not allowing practically anyone to connect this AI to practically anything. This is crazy.

comment by M. Y. Zuo · 2023-03-29T12:35:20.686Z · LW(p) · GW(p)

Without going too far off track, quite a lot of AI plug-ins and offerings lately are following the Bard and Copilot idea of ‘share all your info with the AI so I have the necessary context’ and often also ‘share all your permissions with the AI so I can execute on my own.’
I have no idea how we can be in position to trust that. We are clearly not going to be thinking all of this through.

I think the 'we' here needs to be qualified.

There are influential people and organizations who don't even trust computers to communicate highly sensitive info at all.

They use pen and paper, typewriters, etc...

So of course they wouldn't trust anything even more complex.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-04-07T00:07:17.350Z · LW(p) · GW(p)

I am really not on a Putin level of paranoia.

Just the "these dudes accidentally leaked the titles of our chats to other users, do I really want them to have my email and bank data"level.

GPT-4 Plugs In

Contents

47 comments