Posts

Eliciting Credit Hacking Behaviours in LLMs 2023-09-14T15:07:37.830Z

Comments

Comment by omegastick (isaac-poulton) on OpenAI: Altman Returns · 2023-11-30T21:22:07.908Z · LW · GW

Some people have strong negative priors toward AI in general.

When the GPT-3 API first came out, I built a little chatbot program to show my friends/family. Two people (out of maybe 15) flat out refused to put in a message because they just didn't like the idea of talking to an AI.

I think it's more of an instinctual reaction than something thought through. There's probably a deeper psychological explanation, but I don't want to speculate.

Comment by omegastick (isaac-poulton) on What's your standard for good work performance? · 2023-09-28T15:08:53.577Z · LW · GW

Rather than having objective standards, I find a growth-centric approach to be most effective. Optimizing for output is easy to Goodheart, so as much as possible I treat more as a metric than a goal. It's important that I'm getting more done now than I was a year ago, for example, but I don't explicitly aim for a particular output on a day-to-day basis. Instead I aim to optimize my processes and improve my skills, which leads to increased output. That applies not just to good work performance, but many things.

  • > How much do you get done in a typical month/half year?

    Measuring this objectively is hard, but roughly one large project (big new feature, new application, major design overhaul) per month, or more if the projects I'm working on are smaller.
     
  • > How much do consider aspirational but realistic to get done in a typical month/half year?

    I've managed to get projects that I'd normally finish in a month done in 2 weeks or so by crunching hard, but I'm generally pretty consistent with output on the scale of months/half years. I definitely don't aim for that.
     
  • > How much do you consider on the low end but okay to get done in a typical month/half year?

    I wouldn't be too upset if a project goes over by 25% due to low output (they can go over longer if there are unexpected issues, but that's another thing). Again though, I'm pretty consistent on the scale of months/half years, so this rarely happens.
     
  • > What kind of output would you want to see out of a researcher/community organiser/other independent worker within a month/half a year to be impressed/not be disappointed? (Assuming this is amount is representative of them)

    I don't have objective standards here. If I get the impression they are genuinely putting in a good effort and improving with time, I'm happy. Different people have different strengths, and a person might work quite slowly relative to the average, but produce very high quality work. If they continue improving their output, eventually it will be high (for whatever standard of "high" you like). If they're putting in effort and not improving, they might not be in the right line of work, and then I'd be disappointed.
     
  • > What's the minimum output would you want to see out of a researcher/community organiser/other independent worker to be in favour of them getting funding to continue their work?  (Assuming this is amount is representative of them)

    This is a knapsack problem. Calculate the score = (expected output * expected value of work per unit of output) / funding required for each person that needs funding, sort the list in descending order, and allocate funding in order from top to bottom. You don't need to fully solve the knapsack problem here, because leftover funding can be carried over.
     
  • > What's the minimum output would you want to see out of your friend to feel good about them continuing their current work? (Assuming this is amount is representative of them)

    Their average output over the last 12 months should be higher than their average output over the previous 12, by some non-insignificant amount.
Comment by omegastick (isaac-poulton) on Eliciting Credit Hacking Behaviours in LLMs · 2023-09-14T16:54:46.574Z · LW · GW

Agreed, this was an expected result. It's nice to have a functioning example to point to for LLMs in an RLHF context, though.

Comment by omegastick (isaac-poulton) on Is there something fundamentally wrong with the Universe? · 2023-09-13T12:40:26.235Z · LW · GW

From one perspective, nature does kind of incentivize cooperation in the long term. See The Goddess of Everything Else.

Comment by omegastick (isaac-poulton) on Defunding My Mistake · 2023-09-06T20:40:11.950Z · LW · GW

Is there a reason to believe this is likely? Outside of a strong optimization pressure for niceness (of which there is definitely some, but relative to other optimization pressures it's relatively weak) I'd expect these organizations to be of roughly average possible niceness for their situation.

Comment by omegastick (isaac-poulton) on Any research in "probe-tuning" of LLMs? · 2023-08-16T15:36:56.416Z · LW · GW

A quick Google search of probe tuning doesn't turn up anything. Do you have more info on it?

Probe-tuning doesn't train on LLM's own "original rollouts" at all, only on LLM's activations during the context pass through the LLM.

This sounds like regular fine tuning to me. Unless you mean that the loss is calculated based on one (multiple?) of the network's activations rather than on the output logits.

Edit: I think I get what you mean now. You want to hook a probe to a model and fine-tune it to perform well as a probe classifier, right?

Comment by omegastick (isaac-poulton) on The world where LLMs are possible · 2023-07-10T13:00:12.779Z · LW · GW

It's also possible that there is some elegant, abstract "intelligence" concept (analogous to arithmetic) which evolution built into us but we don't understand yet and from which language developed. It just turns out that if you already have language, it's easier to work backwards from there to "intelligence" than to build it from scratch.

Comment by omegastick (isaac-poulton) on Why it's so hard to talk about Consciousness · 2023-07-03T00:04:32.472Z · LW · GW

This probably isn't the case, but I secretly wonder if the people in camp #1 are p-zombies.

Comment by omegastick (isaac-poulton) on I Think Eliezer Should Go on Glenn Beck · 2023-06-30T10:46:53.002Z · LW · GW

Not very familiar with US culture here: is AI safety not extremely blue-tribe coded right now?

Comment by omegastick (isaac-poulton) on Why am I Me? · 2023-06-28T15:56:51.752Z · LW · GW

How does the logic here work if you change the question to be about human history?

Guessing a 50/50 coin flip is obviously impossible, but if Omega asks whether you are in the last 50% of "human history" the doomsday argument (not that I subscribe to it) is more compelling. The key point of the doomsday argument is that humanity's growth is exponential, therefore if we're the median birth-rank human and we continue to grow, we don't actually have that long (in wall-time) to live.

Comment by omegastick (isaac-poulton) on Self-experiment: A supraphysiological dosage of testosterone. · 2023-06-26T11:44:04.088Z · LW · GW

Please don't do this, this is dangerous.

How much Test E did you take? 200mg/ml, but how many ml?

Usually, one dose of testosterone isn't enough for a noticeable difference in mental state, and by the time it is enough you'll need a plan for managing mental side effects from your increased estrogen.

I'm usually a pretty big fan of bioengineering, self-experimentation, etc. but this strikes me as particularly reckless.

Comment by omegastick (isaac-poulton) on Lessons On How To Get Things Right On The First Try · 2023-06-20T17:37:56.163Z · LW · GW

Is anyone worried about AI one-shotting comprehensive nano-technology? It can make as many tries as it wants, and in fact, we'll be giving it as many tries as we can.

Comment by omegastick (isaac-poulton) on The Prospect of an AI Winter · 2023-03-28T17:52:03.800Z · LW · GW

I think the long gap between GPT-3 and GPT-4 can be explained by Chinchilla. That was the point where OpenAI realized their models were undertrained for their size, and switched focus from scaling to fine-tuning for a couple of years. InstructGPT, Codex, text-davinci-003, and GPT-3.5 were all released in this period.

Comment by omegastick (isaac-poulton) on What did you do with GPT4? · 2023-03-18T17:36:29.101Z · LW · GW

GPT-4 can handle tabletop RPGs incredibly well. You just have to ask it to DM a Dungeons and Dragons 5e game, give it some pointers about narrative style, game setting, etc. and you're off.

For the first couple of hours of play it's basically as good as a human, but annoyingly it starts to degrade after that, making more mistakes and forgetting things. I don't think it's a context length issue, because it forgets info that's definitely within context, but I can think of a few other things that could be the issue.

Comment by omegastick (isaac-poulton) on The Parable of the King and the Random Process · 2023-03-02T03:32:06.793Z · LW · GW

It seems implied that the chance of a drought here is 50%. If there is a 50% chance of basically any major disaster in the foreseeable future, the correct action is "Prepare Now!".

Comment by omegastick (isaac-poulton) on Ways to prepare to a vastly new world? · 2023-02-26T15:47:54.571Z · LW · GW

This advice also applies to the aligned case. And all of the inbetweens. And to most other scenarios.

Comment by omegastick (isaac-poulton) on In Defense of Chatbot Romance · 2023-02-13T07:32:12.396Z · LW · GW

Disclaimer: I run an "AI companion" app, which has fulfilled the role of a romantic partner for a handful of people.

This is the main benefit I see of talking about your issues with an AI. Current-gen (RLHF tuned) LLMs are fantastic at therapy-esque conversations, acting as a mirror to allow the human to reflect on their own thoughts and beliefs. Their weakpoint (as a conversational partner) right now is lacking agency and consistent views of their own, but that's not what everyone needs.

Comment by omegastick (isaac-poulton) on Two very different experiences with ChatGPT · 2023-02-07T18:02:34.137Z · LW · GW

I think you're on to something with the "good lies" vs "bad lies" part, but I'm not so sure about your assertion that ChatGPT only looks at how closely the surface level words in the prompt match the subject of interest.

"LLMs are just token prediction engines" is a common, but overly reductionist viewpoint. They commonly reason on levels above basic token matching, and I don't see much evidence that that's what's causing the issue here.

Comment by omegastick (isaac-poulton) on "Heretical Thoughts on AI" by Eli Dourado · 2023-01-20T11:36:56.151Z · LW · GW

Could you explain the difference between "the same product getting cheaper" and "the same cost buying more"?

Comment by omegastick (isaac-poulton) on "Heretical Thoughts on AI" by Eli Dourado · 2023-01-20T06:47:43.573Z · LW · GW

I think #1 is the most important here. I'm not a professional economist, so someone please correct me if I'm wrong.

My understanding is that TFP is calculated based on nominal GDP, rather than real GDP, meaning the same products and services getting cheaper doesn't affect the growth statistic. Furthermore, although the formulation in the TFP paper has a term for "labor quality", in practice that's ignored because it's very difficult to calculate, making the actual calculation roughly (GDP / hours worked). All this means that it's pretty unsuitable as a measure of how well a technology like the Internet (or AI) improves productivity.

TFP (utilization adjusted even more so) is very useful for measuring impacts of policies, shifts in average working hours, etc. But the main thing it tells us about technology is "technology hasn't reduced average working hours". If you use real GDP instead, you'll see that exponential growth continues as expected.

Comment by omegastick (isaac-poulton) on How confident are we that there are no Extremely Obvious Aliens? · 2022-05-01T15:58:28.281Z · LW · GW

this should show up as a completely dark sphere in the universe

Which, notably, we do see (https://en.m.wikipedia.org/wiki/Boötes_void). Though they don't conflict with our models of how the universe would end up naturally.

Comment by omegastick (isaac-poulton) on Why pessimism sounds smart · 2022-04-26T10:12:42.840Z · LW · GW

100% this. Some optimists make money, some get scammed.

Comment by omegastick (isaac-poulton) on Nudging My Way Out Of The Intellectual Mosh Pit · 2022-01-30T18:28:23.094Z · LW · GW

Reporting back two weeks later: my phone usage is down about 25%, but that's within my usual variance. If there's an effect, it's small enough to not be immediately obvious, and would need some more data to get anything resembling a low p-value.

Anecdotally, though, I'm quite liking having my phone on "almost-greyscale" (chromatic reading mode on my OnePlus phone). When I have to turn it off, the colours feel overwhelming. It also feels like it encourages me to focus on the real world, rather than staring at my phone in a public place.

Comment by omegastick (isaac-poulton) on Nudging My Way Out Of The Intellectual Mosh Pit · 2022-01-17T21:12:07.012Z · LW · GW

Interesting. Complete greyscale sounds like a lot of hassle, but I'm going to try turning the contrast on my phone down to nearly zero and see if I notice any difference.

Comment by omegastick (isaac-poulton) on Nudging My Way Out Of The Intellectual Mosh Pit · 2022-01-16T22:16:13.748Z · LW · GW

I'm curious about your reasons for making your monitors greyscale. What are the benefits of that for you?

Comment by omegastick (isaac-poulton) on The Unreasonable Feasibility Of Playing Chess Under The Influence · 2022-01-13T17:29:13.934Z · LW · GW

If I'm not mistaken (and I'm not a biologist so I might be), alcohol mainly impacts the brain's system 2, leaving system 1 relatively intact. That lines up well with this post.

Comment by omegastick (isaac-poulton) on Brain Efficiency: Much More than You Wanted to Know · 2022-01-07T20:37:27.509Z · LW · GW

If EfficientZero-9000 is using 10,000 times the energy of John von Neumann, and thinks 1,000 times faster, it's actually actually 10 times less energy efficient.

The point of this post is that there is some small amount of evidence that you can't make a computer think significantly faster, or better, than a brain without potentially critical trade offs.

Comment by omegastick (isaac-poulton) on I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness · 2021-11-04T15:46:16.544Z · LW · GW

I don't agree with Eliezer here. I don't think we have a deep enough understanding of consciousness to make confident predictions about what is and isn't conscious beyond "most humans are probably conscious sometimes".

The hypothesis that consciousness is an emergent property of certain algorithms is plausible, but only that.

If that turns out to be the case, then whether or not humans, GPT-3, or sufficiently large books are capable of consciousness depends on the details of the requirements of the algorithm.

Comment by omegastick (isaac-poulton) on I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness · 2021-11-01T11:13:28.205Z · LW · GW

If I'm not mistaken, that book is behaviourally equivalent to the original algorithm but is not the same algorithm. From an outside view, they have different computational complexity. There are a number of different ways of defining program equivalence, but equivalence is different from identity. A is equivalent to B doesn't mean A is B.

See also: Chinese Room Problem

Comment by omegastick (isaac-poulton) on Dating profiles from first principles: heterosexual male profile design · 2021-10-25T13:58:32.056Z · LW · GW

While it's important to bear in mind the possibility that you're not as below average as you think, I don't know your case so I will assume you're correct in your assessment.

Perhaps give up on online dating. "Offline" dating is significantly more forgiving than online.

Comment by omegastick (isaac-poulton) on Truthful AI: Developing and governing AI that does not lie · 2021-10-19T14:00:02.952Z · LW · GW

I think this touches on the issue of the definition of "truth". A society designates something to be "true" when the majority of people in that society believe something to be true.

Using the techniques outlined in this paper, we could regulate AIs so that they only tell us things we define as "true". At the same time, a 16th century society using these same techniques would end up with an AI that tells them to use leeches to cure their fevers.

What is actually being regulated isn't "truthfulness", but "accepted by the majority-ness".

This works well for things we're very confident about (mathematical truths, basic observations), but begins to fall apart once we reach even slightly controversial topics. This is exasperated by the fact that even seemingly simple issues are often actually quite controversial (astrology, flat earth, etc.).

This is where the "multiple regulatory bodies" part comes in. If we have a regulatory body that says "X, Y, and Z are true" and the AI passes their test, you know the AI will give you answers in line with that regulatory body's beliefs.

There could be regulatory bodies covering the whole spectrum of human beliefs, giving you a precise measure of where any particular AI falls within that spectrum.

Comment by omegastick (isaac-poulton) on Cup-Stacking Skills (or, Reflexive Involuntary Mental Motions) · 2021-10-13T16:41:24.586Z · LW · GW

I wonder if this makes any testable predictions. It seems to be a plausible explanation for how some people are extremely good at some reflexive mental actions, but not the only one. It's also plausible that some people are "wired" that way from birth, or that a single or small number of developmental events lead to them being that way (rather than years of involuntary practice).

I suppose if the hypothesis laid out in this post is true, we'd expect people to exhibit get significantly better at some of these "cup-stacking" skills within a few years of being in an environment that builds them. Perhaps it could be tested by seeing if people get significantly better at the "soft skills" required to succeed in an office after a few years working in one.

Comment by omegastick (isaac-poulton) on How much slower is remote work? · 2021-10-08T15:01:49.043Z · LW · GW

Specialising days like that seems like a good idea at first glance, but I get the feeling I'd burn out on meetings pretty quick if all my week's meetings were scheduled on one day. Being able to use a meeting as a break from concrete thinking to switch to more abstract thinking for a while is very refreshing.

Comment by omegastick (isaac-poulton) on The Towel Census: A Methodology for Identifying Orphaned Objects in Your Home · 2019-12-23T04:36:46.556Z · LW · GW

IMO, this is pretty necessary in any shared space. My company does this twice a year for for the office umbrella rack, fridge, and cupboard.

Comment by omegastick (isaac-poulton) on What's going on with this failure of Bayes to converge? · 2019-12-19T08:15:42.632Z · LW · GW

This highlights an interesting case where pure Bayesian reasoning fails. While the chance of it occurring randomly is very low (but may rise when you consider how many chances it has to occur), it is trivial to construct. Furthermore, it potentially applies in any case where we have two possibilities, one of which continually becomes more probable while the other shrinks, but persistently doesn't become disappear.

Suppose you are a police detective investigating a murder. There are two suspects: A and B. A doesn't have an alibi, while B has a strong one (time stamped receipts from a shop on the other side of town). A belonging of A's was found at the crime scene (which he claims was stolen). A has a motive: he had a grudge against the victim, while B was only an acquaintance.

A naive Bayesian (in both senses) would, with each observation, assign higher and higher probabilities to A being the culprit. In the end, though, it turns out that B commited the crime to frame A. He chose someone B had a grudge against, planted the belonging of A's, and forged the receipts.

It's worth noting that, assuming your priors are accurate, given enough evidence you *will* converge on the correct probabilities. Actually acquiring that much evidence in practice isn't anywhere near guaranteed, however.

Comment by omegastick (isaac-poulton) on What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. · 2019-12-05T19:17:28.744Z · LW · GW

IMO, this is a better way of splitting up the argument that we should be funding AI safety research than the one presented in the OP. My only gripe is in point 2. Many would argue that it wouldn't be really bad for a variety of reasons, such as there are likely to be other 'superintelligent AIs' working in our favour. Alternatively, if the decision making were only marginally better than a human's it wouldn't be any worse than a small group of people working against humanity.