NormanPerlmutter's Shortform
post by NormanPerlmutter · 2023-05-28T23:16:32.974Z · LW · GW · 2 commentsContents
2 comments
2 comments
Comments sorted by top scores.
comment by NormanPerlmutter · 2023-05-28T23:16:33.066Z · LW(p) · GW(p)
I just downloaded MS Edge so that I could use Bing AI and ask it to find me a Brazillian hammock more than 6 feet wide. After repeated questioning, it kept giving me hammocks less than 6 feet wide (but more than 6 feet long). Even after I pointed out its error explicitly it kept making the same error and finally Bing gave up and told me it couldn't help. Like it would list two possibilities for me, state the length and width of each, and the width was less than 6 feet in each case.
Given all the amazing stuff we've seen out of AI lately, I'm kind of amazed it wasn't more successful. I'm guessing they don't make Brazillian hammocks in that size. (not sure why, as they make Mayan hammocks much wider than that, but anyway . . . )
Is this a blind spot for Bing? Or does Microsoft prefer for it to turn up bad results rather than say that no such thing exists?
Replies from: karl-roekaeus↑ comment by Bastumannen (karl-roekaeus) · 2023-05-29T17:11:54.368Z · LW(p) · GW(p)
I don't know, but I had a similar experience with chattGPT; after having heard from various sources that it is pretty decent at chess, I decided to challenge it to a friendly game. I was actually expecting to lose since I myself am not much of a chess player, and also having had conversations on topics where I am an expert, depending on your definition of that word, and being really impressed by the reasoning ability of the LLM in some of those conversations. Anyway, chattGPT insisted on playing as white. Then it said: "Ok, let's begin. You get the first move." So I had to explain to it that according to the rules of chess, it's white that moves first. So as usual it excused itself, and the game began. After each move, it printed an image of the table on the screen; I did not pay attention to that since I had my own physical board. (We were exchanging moves by stating the start square and the end square of each move, i.e. not the standard terminology for chess.) So a couple of moves in I noticed that the robot moved one of its Bishops right through one of its pawns. So I remarked that this is not a legal move. The robot agreed, and stated that it is not allowed to capture its own pawns. It made another move, it was my turn, I stated the moved but this time I also looked at the board that the robot displayed on the screen. I realized that something was very wrong, I had a knight where I should have a bishop. So I went back to trace where this error had started. But in the end it became to complicated and we decided to start a fresh game. This game also lasted a couple of moves before strange things started to happen. So we agreed to start fresh again, and now I insisted on playing as white (I was getting tired of having my physical board "upsidedown" compared to the board on screen). Anyway, the robot managed a couple of moves again, before it placed one of it's rooks outside of the board. Now I was tired so I just said "ok I give up, you're too good at this" (I don't want to offend it, maybe it will spare me in a human zoo when it takes over). But jokes asides, why is it so inconsistent? Maybe openAI have dumbed it down not to scare people; maybe it was investigating how I would react; or maybe it just had a bad day, and if I take the time and play it again it will be much better. What I mean by the last option is that all these games took place within the same chatt, so the LLM remembers things from earlier, so maybe, having had this bad start, and knowing that it is often the case that it runs the whole chatt up until the current point through the NN to get the next word (next "token" to be more precise), it might be that since less and less made any sense it went astray farther and farther from giving good moves.
So as for your search on Bing "Powered by chattGPT" maybe try to do the search again and see if you can get a better result. And also, maybe try searching using different words than last time, or trying to explain a little better, try pretending like you're talking to a human unless you did not do that last time.