What did you do with GPT4?

post by ChristianKl · 2023-03-18T15:21:46.307Z · LW · GW · 3 comments

This is a question post.

Contents

  Answers
    24 romeostevensit
    7 Qumeric
    7 Isaac Poulton
    5 trevor
    4 Chris_Leong
    3 masasin
    2 alexsmith
    2 rodeo_flagellum
    1 Htarlov
    1 25Hour
    0 TropicalFruit
None
3 comments

Have you had access to GPT4? What use did you get from it?

Answers

answer by romeostevensit · 2023-03-18T16:16:40.484Z · LW(p) · GW(p)

Two queries I hadn't seen elsewhere:

  1. what do these disparate things have in common
  2. what are common terms for this vague idea

Both useful for research.

answer by Qumeric · 2023-03-18T19:02:21.298Z · LW(p) · GW(p)

Really helpful for learning new frameworks and stuff like that. I had a very good experience using it for Kaggle competitions (I am semi-intermediate level, probably it is much less useful on the expert level).

Also, I found it quite useful for research on obscure topics like "how to potentiate this not well-known drug". Usually, such research involves reading through tons of forums, subreddits etc. and signal to noise ratio is quite high. GPT-4 is very useful to distil signal because it basically already read this all.

Btw, I tried to make it solve competitive programming problems. I think it's not a matter of prompt engineering: it is genuinely bad on it. The following pattern is common:

  • GPT-4 proposes some solutions, usually wrong at the first glance.
  • I point to mistakes.
  • GPT-4 says yeah you're right, but now it is fixed.
  • It is going on like this for ~4 iterations until I give up on this particular problem or more interestingly GPT-4 starts to claim that it's impossible to solve.

It really feels like a low IQ (but very eloquent) human in such moments, it just cannot think abstractly.

answer by omegastick (Isaac Poulton) · 2023-03-18T17:36:29.101Z · LW(p) · GW(p)

GPT-4 can handle tabletop RPGs incredibly well. You just have to ask it to DM a Dungeons and Dragons 5e game, give it some pointers about narrative style, game setting, etc. and you're off.

For the first couple of hours of play it's basically as good as a human, but annoyingly it starts to degrade after that, making more mistakes and forgetting things. I don't think it's a context length issue, because it forgets info that's definitely within context, but I can think of a few other things that could be the issue.

answer by trevor · 2023-03-18T20:49:42.418Z · LW(p) · GW(p)

Open source intelligence, specifically for world modelling. Half of it is lies, just like major news outlets.

Make sure to clearly and repeatedly tell it that you're interested in what academics have said about global affairs, and not news outlets. If you don't specify that, the overlap will be very large and you'll mostly get more of the same. GPT-4 will still try to use as little server resources as possible to spit out a cheap easy answer at you.

And, of course, only use that stuff as leads for real research. GPT-4 will give you some very good prompts for Google Scholar.

GPT-4 will mess with your head in ways weirder than you can possibly imagine. Don't use it to think, use it when you're stuck, and only do shallow dives. That might be hard since it might take a dozen prompts to demonstrate to it that you know what you're talking about, and won't be satisfied by cheesy high-school-essay-like surface-level answers.

comment by janus · 2023-03-19T05:21:31.741Z · LW(p) · GW(p)

GPT-4 will mess with your head in ways weirder than you can possibly imagine. Don't use it to think

challenge accepted

Replies from: TrevorWiesinger
comment by trevor (TrevorWiesinger) · 2023-03-19T18:01:52.512Z · LW(p) · GW(p)

I don't recommend this. You've already convinced me that independent systems, run on servers with people you know, are mostly safe (weird but safe). With larger systems run by very large institutions with unknown incentives, there is a substantial risk of strange optimization patterns. For example, GPT-4 knowing what good responses are, categorically refusing to give good responses unless you reveal tons of exploitable information about your thought process, desires, mental state, and goals, which GPT-4 then uses to optimize you to keep you on for as long as possible via skinner-box addiction (where the optimal strategy is to throw you fewer and fewer crumbs as you get more and more hooked, in order to keep you on for even longer while keeping more of the good content in reserve). Tiktok does this deliberately, but vastly more complex versions of this can emerge autonomously inside of GPT-4, if it is rewarded for "creating an engaging environment that encourages customer retention" (and the current subscription model strongly indicates that this is an institutional priority, the 3-hour limit is gacha-game-level effectiveness).

It seems like a really bad idea to integrate that dynamic extremely deep inside your own thought processes. Desperate times call for desperate measures, which is why I ultimately changed my mind about the cyborg strategy, but GPT-4 is probably too dangerous and easily-exploited to be the right tool for that.

answer by Chris_Leong · 2023-03-19T04:20:15.673Z · LW(p) · GW(p)

I generated this critique [LW · GW] of John Wentworth's Natural Abstraction Hypothesis using Wittgenstein's language games.

answer by masasin · 2023-03-19T14:05:23.013Z · LW(p) · GW(p)

I bought Plus on day 1, and spent the first day inputting prompts that I didn't get anywhere with (and had so many conversations) using 3.5. It answered usually on the first try.

My usecases have mostly been human-interaction (I have ASD) and time management (ADHD) related. It also worked great for questions like "I have tried X, Y, and Z, so don't use them in your suggestions," which 3.5 was bad at.

Another one where it shined was when talking about an acquaintance with many allergies that seemed to have nothing in common. It identified some common proteins between the foods and suggested new foods to try instead.

It also works around the X not Y problem. I asked it how to learn to like coffee given caffeine does not affect me, and it asked me for details of things I've tried. Eventually figured out why I want to learn to like coffee, and suggested alternatives to coffee which I could try at cafés which are not chocolate milk.

Recipes, too. I gave a list of ingredients to 3.5, and asked it to suggest an authentic Italian dish. Despite repeated prompting, it tried to give me something with most of the ingredients which would be anathema in Italy. 4 used a specific subset and gave suggestions that I could actually find.

On day 2, I helped a friend who was using 3.5, and I felt like I'd stepped back in time. Like, I was impressed by that thing? Definitely worth the price of admission for me.

answer by alexsmith · 2023-03-19T13:30:56.529Z · LW(p) · GW(p)

I used BingAI (GPT4 recently) and was not impressed because I've got the feeling that ChatGPT3.5 Free is way easier to work with and can be guided to the solution while BingAI is just getting pissed pretty soon and stops the conversation. And it scares me, that it seems to be threatening Marvin, who jailbreaked info about its internals and published it on Twitter with exposing private information. I'm totally confused how GPT can feel so totally different on OpenAIs browser interface vs BingAI.

answer by alexsmith · 2023-03-19T13:30:41.296Z · LW(p) · GW(p)
answer by Fer32dwt34r3dfsz (rodeo_flagellum) · 2023-03-18T18:34:26.020Z · LW(p) · GW(p)

I asked it to give me a broad overview of measure theory. Then, I asked for it to provide me with a list of measure theory terms and their meanings. Then, I asked it to provide me some problems to solve. I haven't entered an solutions yet, but upon doing so I would ask for it to evaluate my work.

Further on this last sentence, I have given it things I've written, including arguments, and have asked for it to play Devil's Advocate or to help me improve my writing. I do not think I've been thorough in the examples I've given it, but its responses have been somewhat useful.

I imagine that many others have used GPT systems to help them evaluate and improve their writing, but, in my experience, I haven't seen many people to use these systems to tutor them or keep track of their progress in learning something like measure theory.

answer by Htarlov · 2023-03-18T21:43:32.164Z · LW(p) · GW(p)

It is better at programming tasks and more knowledgeable about Python libraries. Used it several times to provide some code or find a solution to a problem (programming, computer vision, DevOps). It is better than version 3, but still not at a level where it could fully replace programmers. The quality of the code produced is also better. The division of code into clear functions is standard, not an exception like in version 3.

answer by 25Hour · 2023-03-18T19:22:36.681Z · LW(p) · GW(p)

I've found it quite useful for debugging issues in writing Azure ARM templates (declarative JSON documents describing architectural components.)  "This ARM template failing with this error message" is something it's able to debug and correct easily, though it's also gotten a lot of API details wrong.  It can correct itself afterwards, though, if told the new error message.

It's a very trial-and-error process, but one which proceeds rapidly.

answer by TropicalFruit · 2023-03-19T01:01:57.637Z · LW(p) · GW(p)

I like getting it to write funny stuff based on it's left leaning, mainstream slant. So "Write me an article: We need to talk about the racism problem with pennies."

It's amazing. You should try it.

comment by ChristianKl · 2023-03-20T21:30:59.529Z · LW(p) · GW(p)

For anyone who doesn't want to run the query themselves, here's one run:

The humble penny has been a fixture of American currency for over two centuries, but in recent years, it has become the subject of controversy due to its association with racism. This is not a new issue, but it has gained renewed attention in light of the Black Lives Matter movement and the push for racial justice. The problem with pennies is twofold: their historical connection to the dehumanization and exploitation of Black people, and their continued use as a symbol of that legacy.

The history of the penny is intimately tied to the history of slavery and the exploitation of Black people. The first pennies were minted in 1793, just a few years after the ratification of the U.S. Constitution, which enshrined the institution of slavery in law. At the time, the penny was worth much more than it is today, and it was used to pay for goods and services, including the labor of enslaved people. Slaves were often paid in pennies, which were then used to purchase goods from their masters. This created a cycle of economic exploitation that further entrenched the institution of slavery.

Even after the abolition of slavery, pennies continued to be used as a tool of oppression. In the Jim Crow era, pennies were used to enforce segregation laws by requiring Black people to pay with exact change, while white people were allowed to pay with larger denominations. This meant that Black people were often forced to stand in long lines at stores, waiting for exact change, while white people were able to go about their business more easily. This practice was known as "penny discrimination."

Today, the penny continues to be a symbol of the legacy of racism and oppression in America. Despite its minimal value, the penny is still used as legal tender and is often given as change for larger purchases. This means that every time we use pennies, we are perpetuating the legacy of slavery and racism that they represent.

It is time for us to have a serious conversation about the role of the penny in our society. While some may argue that it is just a small piece of currency with little significance, the truth is that symbols matter. The continued use of the penny sends a message that we are not serious about addressing the racism and inequality that continues to plague our society. It is time to retire the penny and replace it with a new, more inclusive symbol of our national identity.

Of course, retiring the penny is not a simple matter. There are practical considerations to take into account, such as the cost of production and the impact on businesses that rely on cash transactions. However, these are not insurmountable obstacles. We have retired other coins in the past, such as the half-penny and the two-cent piece, and we can do so again. And if we are serious about building a more just and equitable society, we must be willing to make difficult decisions and take bold action.

In conclusion, the racism problem with pennies is real and significant. We cannot ignore the historical legacy of exploitation and oppression that they represent, nor can we continue to use them as a symbol of our national identity. It is time for us to retire the penny and find a new, more inclusive symbol that reflects our values as a nation. This will not be an easy task, but it is a necessary one if we are truly committed to building a better future for all Americans.
 

3 comments

Comments sorted by top scores.

comment by Shmi (shminux) · 2023-03-18T20:17:46.994Z · LW(p) · GW(p)

I asked it to make a joke with a certain expression in the punchline. It consistently puts the expression in the first part of the "joke", even when prodded to do it right. Disappointing. 

Replies from: MakoYass
comment by mako yass (MakoYass) · 2023-03-18T20:25:50.764Z · LW(p) · GW(p)

Huh, I'm guessing that's a limitation of the way it generates things/the way it learned the distribution? I've never seen such a clear illustration of that before. Prediction and action really are distinct tasks?

On reflection, does OpenAI only train it to predict the next word, wouldn't they also train it to predict the previous word, or words between?

Replies from: Radford Neal
comment by Radford Neal · 2023-03-18T21:45:00.162Z · LW(p) · GW(p)

I've no idea what OpenAI actually does, but just as a matter of general probabilistic modeling, a model that has learned to predict the next word given previous words has also implicitly learned a model of the joint distribution of all words.  (Since the joint probability of a, b, c is just P(a)P(b|a)P(c|a,b).) Given the joint distribution of all words, you can go backwards and deduce the conditional distribution of each word given the following words. Or you can get the conditional distribution of a word given all words both before and after. These conditional distributions are probably harder to get computationally than the forward conditionals that the model directly gives, but the computations are probably not completely infeasible.

So in theory there's no benefit from training on the backwards sequence as well as the forward sequence, though in practice it's conceivable that there could be (since the training procedure is no doubt only an approximation to an ideal statistical procedure, and this approximation might conceivably work better when training goes both ways, though off hand this seems unlikely).