Comment by rekrul on Developmental Stages of GPTs · 2020-07-27T08:12:17.138Z · score: 6 (6 votes) · LW · GW

Another small silver lining is we don't have to worry about making sure our alignment tools and processes can generalize, just that they scale. So they can be as tailor made to GPT as we want. I don't think this buys us much, as making an effective scalable tool in the first place seems like the much harder part than generalizing it.

Agreed, GPT is very alien under the hood even though it's mimicking us, and that poses some problems. I'm curious however, just how good it's mimicry of us is/going to be, more specifically it's moral mimicry. If it shares the same conceptual understanding of "the right thing to do" as we do, maybe there's some way we can leverage that if it's good enough at it early on. I don't think I can recall GPT being given unique moral dilemmas off the top of my head, but it'd be interesting to see if it's on the right path or not currently.

Also, has anyone made a chart showing plausible GPT level arrival dates yet? That seems like it would be very nice to have around to reference.

Comment by rekrul on To what extent is GPT-3 capable of reasoning? · 2020-07-22T11:15:33.667Z · score: 5 (4 votes) · LW · GW

You've given me a lot to think about (and may have even lowered my confidence in some of my assertions). Kudos!

I do still have some thoughts to give in response though, but they don't really function as very in-depth responses to your points, as I'm still in the process of ruminating:

  • I agree with you that GPT-3 probably hasn't memorized the prompts given in the OP, it's too rare for that to be worth it. I just think it's so big and has access to so much data it really doesn't need to solve prompts like that. Take the Navy Seal Copypasta prompts Gwern did as an illustration. Those were essentially GPT-3 belching out it's connections in a very funny fashion. A lot of which were very weird/obscure. I just think people aren't truly appreciating the huge scope of this machine, and in a sense are underestimating it.

  • In some sense, I wish we could test GPT-3 how we could test animals for reasoning. Text is different from real life though, and If we put a totally new object in front of a crow, it could interact with it and learn how it works and use it to solve a difficult task, whereas I don't feel we can necessarily do the same with GPT-3. The closest we get in my opinion are the prompts where a novel word is introduced, but we usually have to describe it enough to work, and I feel like that defeats the discovery angle, which I feel is important.

  • Actually, now that I'm on this train of thought, a good prompt that could convince me there's some kind of reasoning going on could be built upon that. Perhaps, a prompt where a fictional very-alien animal (by this I mean, very unlike Earth animals, so with very weird "abilities") is in conflict with another fictional very-alien animal, but we don't give GPT much information about the two animals and the overall scenario, and we somehow force it to interrogate us until it understands enough to submit an accurate story about how the conflict could play out. This test is interesting, but I don't know how viable it is, we would need to get very creative and design two alien animals in a very in-depth manner, as well as how they interact, the environment, the planet etc. Perhaps a variant of this can be devised to reduce the difficult creative workload while still retaining the weird & alien (and thus less dataset interference) nature of it. I also am not familiar with GPT in a role as an "interrogator", and am not sure if this can be done currently. It tends to be the one interrogated, not the other way around.

  • I think a reason why I'm bullish about the pattern matching vs reasoning distinction, is cautiousness. Like you said, humans who are experienced enough can then skip the reasoning part and just start pattern-matching in the domains they're familiar with. GPT-3 is very "experienced" in it's own weird way, and it's amazing pattern-matching abilities could let it get past obstacles we place while unlike us, skipping the reasoning step. So, I feel like if we get convinced an AI is reasoning when it's not, we could deploy it in circumstances where it's lack of reasoning could cause problems and maybe even damage. I don't think this is much of a possibility with GPT-3, but it could with future versions, so I'd prefer to be cautious.

  • I had some other vague thoughts, but I've been awake for like 20 hours, and my brains weird but maybe useful tangent about alien animals chased them all away. Apologies lol.

Comment by rekrul on To what extent is GPT-3 capable of reasoning? · 2020-07-21T06:27:18.306Z · score: 7 (4 votes) · LW · GW

I recognize the points you are making, and I agree, I don't want to be a person who sets an unfeasibly high bar, but with how GPT-3 was developed it's really difficult to put one that isn't near that height. If GPT-3 was instead made with mostly algorithmic advances instead of mostly scaling, I'd be a lot more comfortable placing said bar and a lot less skeptical, but it wasn't, and the sheer size of all this is in a sense intimidating.

The source of a lot of my skepticism is GPT-3's inherent inconsistency. It can range wildly from it's high-quality ouput to gibberish, repetition, regurgitation etc. If it did have some reasoning process, I wouldn't expect such inconsistency. Even when it is performing so well people call it "reasoning" it has enough artifacts of it's "non-reasoning" output to make me skeptical (logical contradictions, it's tendency to repeat itself i.e. "Because Gravity Duh" like in the OP, etc).

This is unfortunately just a very hard subject to get to the truth on. Illusions are meant to be convincing. Designing a test is difficult purely because the machine is like I said, intimidating. It has so much knowledge inside it's "head". It's pretty much safe to just assume it knows just about everything on the internet prior to it's creation. How do we design a test around that? Even when we get weird and start asking about stuff like reversing gravity and clouds becoming a solid substance, the internet is big and huge and full of weird questions, and the model is big enough to have representations of just about everything, including the weird.

So the only tests I can think to design around that are wholly unsatisfying. Like, asking it to replicate using reasoning some fact about something we discover next week that we also discovered purely through reasoning. This is technically a test, in the same way garbage is technically edible.

The ideal answer is we develop transparency & interpretation tools that allow us to crack open the huge black-box that is GPT-3 and truly grok how it works. But I don't hold out hope for this, as the ML community is for some reason I can't understand relatively uninterested in these types of tools, which is sad and somewhat worrying.

So to cut this short, I think what I stated near the beginning about the inconsistencies of the model is the best bet. If that's fixed, if you ask say GPT-4 a question and it almost always correctly determines whether you want a joke or a serious answer, if it lacks logical contradictions in a hypothetical news article you ask it to make, if it doesn't carry over all of the same errors as previous models, if it doesn't just regurgitate and repeat itself etc. I'd be a lot less skeptical about reasoning, as by that point it either has it, or it's pattern matching has somehow scaled well enough to iron out all of the problems and can probably be just as good as reasoning.

These are my thoughts, rambling as they may be. I apologize if this doesn't fully answer your comment, as I said this whole thing is just difficult to deal with, which isn't unexpected since it's the peak of modern technology. I'm also astonishingly bad at putting my thoughts down into words. If GPT-3 had anything like thoughts, it'd probably be better at this than me lol.

Comment by rekrul on To what extent is GPT-3 capable of reasoning? · 2020-07-21T01:23:05.910Z · score: 2 (3 votes) · LW · GW

In a very loosely similar sense (though not at all accurate architectural sense) to how AlphaGo knows which moves are relevant for playing Go. I wouldn't say it was reasoning. It was just recognizing and predicting.

To give an example: If I were to ask various levels of GPT (perhaps just 2 and 3, as I'm not very familiar with the capabilities of the first version off the top of my head) "What color is a bloody apple" It would have a list of facts in it's "head" about the words "bloody" and "apple", like one can be red or green, one is depicted as various shades of red, and in some circumstances brown and black, one falls from trees, one is british slang etc. When the word "color" is added in, it does the same thing with that, and a primary color is red. Since all three share "red" as listed facts, that is incredibly relevant, and most likely to be the correct answer.

This is likely a poorly explained and inaccurate retelling of what GPT exactly does, but essentially I argue it's doing something closer to that than reasoning. It's always been doing that, and now with the third version, it does it well enough to give off the illusion of reasoning.

Comment by rekrul on To what extent is GPT-3 capable of reasoning? · 2020-07-21T00:38:04.103Z · score: 1 (1 votes) · LW · GW

GPT-3 was trained on an astronomical amount of data from the internet, and asking weird hypotheticals is one of the internet's favorite pastimes. I would find it surprising if it was trained on no data resembling your prompts.

There's also the fact that it's representations are staggeringly complex. It knows an utterly absurd amount of facts "Off the top of it's head", including the mentioned facts about muzzle velocity, gravity, etc., and it's recognition abilities are great enough to recognize which of the facts it knows are the relevant ones based on the content of the prompt to get to the right answer, and then it outputs it.

That's roughly my view, although I'm not entirely satisfied with how I've explained it. Apologies. So yeah, none of the tests of reasoning so far have made me believe that it wasn't just doing something like the above to get it right. I'd be surprised if it was doing something resembling reasoning, as I don't think large model + large dataset is enough to learn how to do that. You also need a task that is difficult enough, and then maybe it will develop. (like real evolution), and I don't think text prediction fits that criteria.

Comment by rekrul on To what extent is GPT-3 capable of reasoning? · 2020-07-20T22:49:02.513Z · score: 5 (4 votes) · LW · GW

Yeah, this sampling stuff brings up arguments about "curating" or "If you rephrase the same question and get a different answer then there is no reasoning/understanding here" which I'm sympathetic to.

I also think categorizing GPT-3's evasiveness, tendency to take serious prompts as joke prompts, etc. as solely the fault of the human is unfair. GPT-3 also shares the blame for failing to interpret the prompt correctly. This is hard task obviously, but that just means we have further to go, despite the machine's impressiveness already.

Comment by rekrul on To what extent is GPT-3 capable of reasoning? · 2020-07-20T22:27:50.252Z · score: 3 (3 votes) · LW · GW

I still haven't been convinced GPT-3 is capable of reasoning, but I'm also starting to wonder if it's even that important. Roughly, all GPT-3 does is examine text, try to find a pattern, and continue it. But it is so massive, and trained on so much data that the patterns it can "see" and connections it can make are far more expansive than we'd expect. What this means, is while it doesn't try to comprehend any logical questions and then apply some kind of reasoning to answer it, it's ability to see patterns combined with it's staggeringly huge amount of data and connections it's made allow it to in a sense, "Brute Force" the answer anyway. This makes me believe pattern-matching can get us a lot farther than we previously thought it could, and do many things people held up before as needing reasoning.

This is all just the opinions of a guy who's only qualification is reading a lot of other opinions about this thing though.

Comment by rekrul on GPT-3 Fiction Samples · 2020-07-02T06:19:33.733Z · score: 1 (1 votes) · LW · GW

Yeah, that part was very impressive. Personally, I'm not sure if I get why it requires reasoning, it seems more like just very advanced and incredible recognition and mimicry abilities to me. But I'm just a casual observer of this stuff, I could easily be not getting something. Hopefully, since we can all play around with GPT-3, people continue to push it to it's limits, and we can get an accurate picture of what's really going on under the hood, if it's really developed some form of reasoning.

Comment by rekrul on GPT-3 Fiction Samples · 2020-06-28T03:34:18.887Z · score: 2 (2 votes) · LW · GW

I'd love to know your reasoning here. I've been very impressed with GPT-3, but not to the extent I'd majorly update my timelines.

Comment by rekrul on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-29T17:20:16.502Z · score: 5 (3 votes) · LW · GW

That's a fair point, but I don't think you need to have a livestreamed event to gain access to professional Starcraft players beyond consultation. I'm sure many would be willing to be flown to DeepMind HQ and test AlphaStar in private.

Comment by rekrul on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-29T05:30:26.536Z · score: 4 (2 votes) · LW · GW

The "it" I was referring to were these showmatches, I worded that poorly my bad. I just don't see the point in having these uneven premature showmatches, instead of doing the extra work (which might be a lot) and then having fair showmatches. All the current ones have brought is confusion, and people distrusting Deepmind (due to misleading comments/graphs), and whenever they do finally get it right, it won't get the same buzz as before, because the media and general public will think it's old news that's already been done. Having them now instead of later just seems like a bad idea all around.

Comment by rekrul on [Link] Did AlphaStar just click faster? · 2019-01-29T05:22:09.624Z · score: 6 (4 votes) · LW · GW

There were definitely some complaints around OAI5's capabilities. Besides criticism over it's superhuman reaction speed, the restrictions placed upon the game to allow the AI to learn it essentially formed it's own weird meta the humans were unfamiliar with, so the AI was using tactics built for an entirely different game than the humans.

Honestly, I haven't been very impressed with either AI through these showmatches, because it's so hard to tell what their "intelligence" level is, as it's influenced heavily by their unfair capabilities. They need to both put a lot more effort into limiting their AI's inherent and unfair advantages, and other companies who want to try their hand at getting their AI to conquer a video game need to not repeat these disappointing showmatches.

Comment by rekrul on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-25T05:16:07.592Z · score: 5 (5 votes) · LW · GW

I'm a relative layperson, so I honestly don't know. Maybe no new tricks are needed. But if that's the case, why not just do it and not have all this confusion flying about?

These big uneven showmatches do a very poor job of highlighting the state of the art in Ai tactics, as the tactics the AI use seem to be heavily influenced by it's "unfair" capabilities. I can't really tell if these agents are generally smart enough to play at the full game evenly but use unfair strategies because they're allowed to, or if they're dumber agents who couldn't play the full game evenly so they're propped up by their unfair capabilities and earn victories by exploiting them.

Comment by rekrul on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-25T03:08:06.789Z · score: 7 (7 votes) · LW · GW

After both Dota and this, I'm starting to come to the conclusion that video games are a poorer testbed for examining the "thinking" capabilities of modern AI. Unless a substantial amount of effort is put in to restrict the AI to being roughly equal "physically" to humans, it becomes very difficult to determine how much of the AI's victories were due to "thinking" or due to being so "physically" superior it pulls off moves humans are simply incapable of doing even if they thought of them (or having other advantages like being able to see all of the visible map at the same time). Board games didn't really have that "physical" problem, but unfortunately we're all out of those.

As such gauging exactly how "smart" these video game AI's like OAI5 and AlphaStar are is kinda difficult to do.

Comment by rekrul on Electrons don’t think (or suffer) · 2019-01-04T10:17:36.083Z · score: 1 (1 votes) · LW · GW

Even if something like an electron has some weird degraded form of consciousness, I don't see why I should worry about it suffering. It doesn't feel physical pain or emotional pain, because it lacks physical processes for both of them, and any talk of it having "goals" and it failing to reach them means it suffers just reeks of anthropomorphism. I just don't buy it.

Comment by rekrul on Reinterpreting "AI and Compute" · 2018-12-27T00:06:39.173Z · score: 1 (1 votes) · LW · GW

I agree that researchers can take shortcuts and develop tricks, but I don't see how that shortens it to something as incredibly short as 1 year, especially since we will be starting with parts that are far worse than their equivalent in the human brain.

Comment by rekrul on Reinterpreting "AI and Compute" · 2018-12-26T23:01:06.720Z · score: 5 (3 votes) · LW · GW

"We could assume—by analogy with human brain training in childhood—that to train one model of human mind, at least 1 year of training time is needed (if the computer is running on the same speed as human mind)."

Could you clarify here? I'm no expert, but I'm pretty sure human brains in childhood take a lot longer than a year to learn everything they need to survive and thrive in the real world. And they have a lot more going for them than anything we'll build for the foreseeable future (better learning algorithm, better architecture built by evolution, etc.)