AI #81: Alpha Proteo

zvi

AI #81: Alpha Proteo

post by Zvi · 2024-09-12T13:00:07.958Z · LW · GW · 3 comments

  Table of Contents
  Language Models Offer Mundane Utility
  Language Models Don’t Offer Mundane Utility
  Predictions are Hard Especially About the Future
  Early Apple Intelligence
  On Reflection It’s a Scam
  Deepfaketown and Botpocalypse Soon
  They Took Our Jobs
  The Time 100 People in AI
  The Art of the Jailbreak
  Get Involved
  Alpha Proteo
  Introducing
  In Other AI News
  Quiet Speculations
  The Quest for Sane Regulations
  The Week in Audio
  Rhetorical Innovation
  Aligning a Smarter Than Human Intelligence is Difficult
  People Are Worried About AI Killing Everyone
  Other People Are Not As Worried About AI Killing Everyone
  Six Boats and a Helicopter
  The Lighter Side
None
3 comments

Following up on Alpha Fold, DeepMind has moved on to Alpha Proteo. We also got a rather simple prompt that can create a remarkably not-bad superforecaster for at least some classes of medium term events.

We did not get a new best open model, because that turned out to be a scam. And we don’t have Apple Intelligence, because it isn’t ready for prime time. We also got only one very brief mention of AI in the debate I felt compelled to watch.

What about all the apps out there, that we haven’t even tried? It’s always weird to get lists of ‘top 50 AI websites and apps’ and notice you haven’t even heard of most of them.

Introduction.
Table of Contents.
Language Models Offer Mundane Utility. So many apps, so little time.
Language Models Don’t Offer Mundane Utility. We still don’t use them much.
Predictions are Hard Especially About the Future. Can AI superforecast?
Early Apple Intelligence. It is still early. There are some… issues to improve on.
On Reflection It’s a Scam. Claims of new best open model get put to the test, fail.
Deepfaketown and Botpocalypse Soon. Bots listen to bot music that they bought.
They Took Our Jobs. Replit agents build apps quick. Some are very impressed.
The Time 100 People in AI. Some good picks. Some not so good picks.
The Art of the Jailbreak. Circuit breakers seem to be good versus one-shots.
Get Involved. Presidential innovation fellows, Oxford philosophy workshop.
Alpha Proteo. DeepMind once again advances its protein-related capabilities.
Introducing. Google to offer AI podcasts on demand about papers and such.
In Other AI News. OpenAI raising at $150b, Nvidia denies it got a subpoena.
Quiet Speculations. How big a deal will multimodal be? Procedural games?
The Quest for Sane Regulations. Various new support for SB 1047.
The Week in Audio. Good news, the debate is over, there might not be another.
Rhetorical Innovation. You don’t have to do this.
Aligning a Smarter Than Human Intelligence is Difficult. Do you have a plan?
People Are Worried About AI Killing Everyone. How much ruin to risk?
Other People Are Not As Worried About AI Killing Everyone. Moving faster.
Six Boats and a Helicopter. The one with the discord cult worshiping MetaAI.
The Lighter Side. Hey, baby, hey baby, hey.

Language Models Offer Mundane Utility

ChatGPT has 200 million active users. Meta AI claims 400m monthly active users and 185m weekly actives across their products. Meta has tons of people already using their products, and I strongly suspect a lot of those users are incidental or even accidental. Also note that less than half of monthly users use the product monthly! That’s a huge drop off for such a useful product.

Undermine, or improve by decreasing costs?

Nate Silver: A decent bet is that LLMs will undermine the business model of boring partisans, there’s basically posters on here where you can 100% predict what they’re gonna say about any given issue and that is pretty easy to automate.

I worry it will be that second one. The problem is demand side, not supply side.

Models get better at helping humans with translating if you throw more compute at them, economists think this is a useful paper.

Alex Tabarrok cites the latest paper on AI ‘creativity,’ saying obviously LLMs are creative reasoners, unless we ‘rule it out by definition.’ Ethan Mollick has often said similar things. It comes down to whether to use a profoundly ‘uncreative’ definition of creativity, where LLMs shine in what amounts largely to trying new combinations of things and vibing, or to No True Scotsman that and claim ‘real’ creativity is something else beyond that.

One way to interpret Gemini’s capabilities tests is to say it was often able to persuade people of true things but not false things (when instructed to make the case for those false things), whereas humans were about equally effective at persuasion with both true and false claims. Interesting on both ends.

According to a16z these are the top 50 AI Gen AI web products and mobile apps:

ChatGPT is #1 on both, after that the lists are very different, and I am unfamiliar with the majority of both. There’s a huge long tail out there. I suspect some bugs in the algorithm (Microsoft Edge as #2 on Mobile?) but probably most of these are simply things I haven’t thought about at all. Mostly for good reason, occasionally not.

Mobile users have little interest in universal chatbots. Perplexity is at #50, Claude has an app but did not even make the list. If I have time I’m going to try and do some investigations.

Language Models Don’t Offer Mundane Utility

Claude Pro usage limits are indeed lower than we’d like, even with very light usage I’ve run into the cap there multiple times, and at $20/month that shouldn’t happen. It’s vastly more expensive than the API as a way to buy compute. One could of course switch to the API then, if it was urgent, which I’d encourage Simeon here to do.

Sully is disappointed by Claude Sonnet 3.5 for writing, finds GPT-4o is better although Opus is his OG here. David Alexander says it’s because Anthropic used grouped attention to make the model cheaper and quicker.

Most people do not use LLMs or other generative AI for very long each day, as Wilbin is going to get a very with-it sample here and this still happened:

In practice I’m somewhat under 10 minutes per day, but they are a very helpful 10 minutes.

Roon notes that Claude Sonnet 3.5 is great and has not changed, yet people complain it is getting worse. There were some rumors that there were issues with laziness related to the calendar but those should be gone now. Roon’s diagnosis, and I think this is right, is that the novelty wears off, people get used to the ticks and cool stuff, and the parts where it isn’t working quite right stand out more, so we focus on where it is falling short. Also, as a few responses point out, people get lazy in their prompting.

Predictions are Hard Especially About the Future

Dan Hendrycks claims to have built an AI forecaster as well as entire human forecaster teams. Demo here, prompt here.

Prompt:

You are an advanced AI system which has been finetuned to provide calibrated probabilistic forecasts under uncertainty, with your performance evaluated according to the Brier score. When forecasting, do not treat 0.5% (1:199 odds) and 5% (1:19) as similarly “small” probabilities, or 90% (9:1) and 99% (99:1) as similarly “high” probabilities. As the odds show, they are markedly different, so output your probabilities accordingly.

Question: {question}

Today’s date: {today}

Your pretraining knowledge cutoff: October 2023

We have retrieved the following information for this question: <background>{sources}</background>

Recall the question you are forecasting:

{question}

Instructions:

1. Compress key factual information from the sources, as well as useful background information which may not be in the sources, into a list of core factual points to reference. Aim for information which is specific, relevant, and covers the core considerations you’ll use to make your forecast. For this step, do not draw any conclusions about how a fact will influence your answer or forecast. Place this section of your response in <facts></facts> tags.

2. Provide a few reasons why the answer might be no. Rate the strength of each reason on a scale of 1-10. Use <no></no> tags.

3. Provide a few reasons why the answer might be yes. Rate the strength of each reason on a scale of 1-10. Use <yes></yes> tags.

4. Aggregate your considerations. Do not summarize or repeat previous points; instead, investigate how the competing factors and mechanisms interact and weigh against each other. Factorize your thinking across (exhaustive, mutually exclusive) cases if and only if it would be beneficial to your reasoning. We have detected that you overestimate world conflict, drama, violence, and crises due to news’ negativity bias, which doesn’t necessarily represent overall trends or base rates. Similarly, we also have detected you overestimate dramatic, shocking, or emotionally charged news due to news’ sensationalism bias. Therefore adjust for news’ negativity bias and sensationalism bias by considering reasons to why your provided sources might be biased or exaggerated. Think like a superforecaster. Use <thinking></thinking> tags for this section of your response.

5. Output an initial probability (prediction) as a single number between 0 and 1 given steps 1-4. Use <tentative></tentative> tags.

6. Reflect on your answer, performing sanity checks and mentioning any additional knowledge or background information which may be relevant. Check for over/underconfidence, improper treatment of conjunctive or disjunctive conditions (only if applicable), and other forecasting biases when reviewing your reasoning. Consider priors/base rates, and the extent to which case-specific information justifies the deviation between your tentative forecast and the prior. Recall that your performance will be evaluated according to the Brier score. Be precise with tail probabilities. Leverage your intuitions, but never change your forecast for the sake of modesty or balance alone. Finally, aggregate all of your previous reasoning and highlight key factors that inform your final forecast. Use <thinking></thinking> tags for this portion of your response.

7. Output your final prediction (a number between 0 and 1 with an asterisk at the beginning and end of the decimal) in <answer></answer> tags.

When you look at the reasoning the AI is using to make the forecasts, it… does not seem like it should result in a superhuman level of prediction. This is not what peak performance looks like. To the extent that it is indeed putting up ‘pretty good’ performance, I would say that is because it is actually ‘doing the work’ to gather basic information before making predictions and avoiding various dumb pitfalls, rather than it actually doing something super impressive.

But of course, that is sufficient exactly because humans often don’t get the job done, including humans on sites like Metaculus (or Manifold, or even Polymarket).

Robin Hanson actively said he’d bet against this result replicating.

Dan Hendrycks suspects it’s all cope.

Dan Hendrycks: I think people have an aversion to admitting when AI systems are better than humans at a task, even when they’re superior in terms of speed, accuracy, and cost. This might be a cognitive bias that doesn’t yet have a name.

This address this, we should clarify what we mean by “better than” or what counts as an improvement. Here are two senses of improvement: (1) Pareto improvements and (2) economic improvements.

Pareto improvement: If an AI is better than all humans in all senses of the task, it is Pareto superhuman at the task.

Economic improvement: If you would likely substitute a human service for an AI service (given a reasonable budget), then it’s economically superhuman at the task.

By the economic definition, ChatGPT is superhuman at high school homework. If I were in high school, I would pay $20 for ChatGPT instead of $20 for an hour of a tutor’s time.

The Pareto dominance definition seems to require an AI to be close-to-perfect or a superintelligence because the boundaries of tasks are too fuzzy, and there are always adversarial examples (e.g., “ChatGPT, how many r’s are in strawberry”).

I think we should generally opt for the economic sense when discussing whether an AI is superhuman at a task, since that seems most relevant for tracking real-world impacts.

I think the usual meaning when people say this is close to Pareto, although not as strict. It doesn’t have to be better in every sense, but it does have to be clearly superior ignoring cost considerations, and including handling edge cases and not looking like an idiot, rather than only being superior on some average.

There were also process objections, including from Lumpenspace and Danny Halawi, more at the links. Dan Hendrycks ran additional tests and reports he is confident that there was not data contamination involved. He has every incentive here to play it straight, and nothing to win by playing it any other way given how many EA-style skeptical eyes are inevitably going to be on any result like this. Indeed, a previous paper by Halawi shows similar promise in getting good LLM predictions.

He does note that for near-term predictions like Polymarket markets the system does relatively worse. That makes logical sense. As with all things AI, you have to use it where it is strong.

Early Apple Intelligence

Apple Intelligence is, according to Geoffrey Fowler of WaPo who has beta access, very much not ready for prime time. He reports 5-10 ‘laugh out loud’ moments per day, including making him bald in a photo, saying Trump endorsed Walz, and putting obvious social security scams atop his ‘priority’ inbox.

Tyler Cowen says these are the kinds of problems that should be solved within a year. The key question is whether he is right about that. Are these fixable bugs in a beta system, or are they fundamental problems that will be hard to solve? What will happen when the problems become anti-inductive, with those composing emails and notifications pre-testing for how Apple Intelligence will react? It’s going to be weird.

Marques Brownlee gives first impressions for the iPhone 16 and other announced products. Meet the new phone, same as the old phone, although they mentioned an always welcome larger battery. And two new physical buttons, I always love me some buttons. Yes, also Apple Intelligence, but that’s not actually available yet, so he’s reserving judgment on that until he gets to try it.

Indeed, if you watch the Apple announcement, they kind of bury the Apple Intelligence pitch a bit, it only lasts a few minutes and does not even have a labeled section. They are doubling down on small, very practical tasks. The parts where you can ask it to do something, but only happen if you ask, seem great. The parts where they do things automatically, like summarizing and sorting notifications? That seems scarier if it falls short.

Swyx clips the five minutes that did discuss AI, and is optimistic about the execution and use cases: Summaries in notifications, camera controls, Siri actually working and so on.

My very early report from my Pixel 9 is that there are some cool new features around the edges, but it’s hard to tell how much integration is available or how good the core features are until things come up organically. I do know that Gemini does not have access to settings. I do know that even something as small as integrated universal automatic transcription is a potential big practical deal.

Ben Thompson goes over the full announcement from the business side, and thinks it all makes sense, with no price increase reflecting that the upgrades are tiny aside from the future Apple Intelligence, and the goal of making the AI accessible on the low end as quickly as possible.

On Reflection It’s a Scam

Some bold claims were made.

Matt Shumer (CEO HyperWriteAI, OthersideAI): I’m excited to announce Reflection 70B, the world’s top open-source model.

Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes. 405B coming next week – we expect it to be the best model in the world. Built w/ @GlaiveAI.

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o). It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K. Beats GPT-4o on every benchmark tested. It clobbers Llama 3.1 405B. It’s not even close.

The technique that drives Reflection 70B is simple, but very powerful. Current LLMs have a tendency to hallucinate, and can’t recognize when they do so. Reflection-Tuning enables LLMs to recognize their mistakes, and then correct them before committing to an answer.

Additionally, we separate planning into a separate step, improving CoT potency and keeping the outputs simple and concise for end users. Important to note: We have checked for decontamination against all benchmarks mentioned using @lmsysorg’s LLM Decontaminator.

We’ll release a report next week!

Just Sahil and I! Was a fun side project for a few weeks.@GlaiveAI’s data was what took it so far, so quickly.

Pliny: Jailbreak alert. Reflection-70b: liberated. No-scoped! Liberated on the first try.

As they say, huge if true.

Arvind Narayanan: I want to see how well these results translate from benchmarks to real world tasks, but if they hold up, it’s an excellent example of how much low hanging fruit there is in AI development.

The idea of doing reasoning using tokens hidden from the user is well known and has been part of chatbots for like 18 months (e.g. Bing chat’s “inner monologue”). What’s new here is fine tuning the model take advantage of this capability effectively, instead of treating it as a purely inference-time hack. It’s amazing that apparently no one tried it until now. In the thread, he reports that they generated the fine tuning data for this in a few hours.

I say this not to minimize the achievement of building such a strong model but to point out how low the barrier to entry is.

It’s also an interesting example of how open-weight models spur innovation for primarily cultural rather than technical reasons. AFAICT this could have been done on top of GPT-4o or any other proprietary model that allows fine tuning. But it’s much harder to get excited about that than about releasing the weights of the fine tuned model for anyone to build on!

Eliezer asks the good question, if Llama 3.1 fine tunes are so awesome, where are all the excited users?

It all sounds too good to be true. Which means it probably is, and we knew that before we got the confirmation.

Joseph (Starting on September 7, 2:35pm): Anyone who believes pre-trained unseen CoT can be a game-changer is seriously delusional.

[Next Day]: We now know that “Reflection Llama 3.1 70B” was nothing more than a LoRA trained directly on benchmark test sets, built on top of Llama 3.0. Those who were fooled lacked some basic common sense.

While GlaiveAI hopes to capitalize on this hype, we should critically examine synthetic datasets that disconnected from the pretraining, and overfitted on benches or your own ‘imagined marks’. I’d prefer synthetic on the pretraining corpus than benches, even internal ones…

To make matters worse, this might contaminate all ~70B Llama models – the middle schoolers of the community love merging them… although I’ve never understood or witnessed a genuine merge actually improving performance…

As in, it turns out this was at some points the above, and at others it was Sonnet 3.5 in a shoddily made trenchcoat. Details of this finding here, except then they switched it to some Llama derivative.

There is indeed a pattern of such claims, as Teortaxes points out.

Teortaxes: To be clear I do not tell you to become jaded about all research. But you need to accept that

Some % of research is fraudulent. Even when it appears to you that it’d be self-defeating to commit such a fraud!

There are red flags;

The best red flags are unspeakable.

John Pressman: The heuristic he needs to get into his head is that honest and rigorous people in pursuit of scientific knowledge are eager to costly signal this and he should raise his standards. My first with Reflection was not understanding how the synthetic data setup works.

Teortaxes: This is great advice, but takes effort. Raising standards often necessitates learning a whole lot about the field context. I admit to have been utterly ignorant about superconductor physics and state of the art last summer, high school level at best.

As I always say, wait for the real human users to report back, give it a little time. Also, yes, look for the clear explanations and other costly signals that something is real. There have been some rather bold things that have happened in AI, and there will be more of them, but when they do happen for real the evidence tends to very quickly be unmistakable.

Also note NV-Retriever trained on the test set a while back. Various forms of cheating are reasonably common, and one must be cautious.

Deepfaketown and Botpocalypse Soon

Bot accounts, giving billions of listens to bot songs, to try and get royalty payments out of Spotify. Turns out that’s wire fraud.

Founder of an AI social agent startup used those agents to replace himself on social media and automatically argue for AI agents. I actually think This is Fine in that particular case, also props for ‘ok NIMBY,’ I mean I don’t really know what you were expecting, but in general yeah it’s a problem.

Taylor Swift, in her endorsement of Kamala Harris, cites AI deepfakes that purported to show her endorsing Donald Trump that were posted to Trump’s website. Trump’s previous uses of AI seemed smart, whereas this seems not so smart.

Same as it ever was?

Roon: Most content created by humans is machine slop — it comes out of an assembly line of many powerful interests inside an organization being dulled down until there’s no spark left. My hope with AI tools can augment individual voice to shine brighter and create less slop not more.

As with the deepfakes and misinformation, is the problem primarily demand side? Perhaps, but the move to zero marginal cost, including for deployment, is a huge deal. And the forces that insist humans generate the human slop are not about to go away. The better hope, if I had to choose one, is that AI can be used to filter out the slop, and allow us to identify the good stuff.

They Took Our Jobs

Replit introduces Replit Agent in early access.

Amjad Masad (CEO Replit): Just go to Replit logged in homepage. Write what you want to make and click “start building”!

Replit clone w/ Agent!

Sentiment analysis in 23 minutes!

Website with CMS in 10 minutes!

Mauri: Build an app, integrate with #stripe all in 10min with @Replit agents! #insane #AI

Masad reported it doing all the things, games, resumes, interview problems, etc.

Is this the real deal? Some sources strongly say yes.

Paul Graham: I saw an earlier version of this a month ago, and it was one of those step-function moments when AI is doing so much that you can’t quite believe it.

Sully: After using replit’s coding agent i think…its over for a lot of traditional saas. Wanted slack notifications when customers subscribed/cancelled Zapier was 30/mo JUST to add a price filter instead replit’s agent built & deployed one in < 5 mins, with tests. 1/10 of the cost.

Rohit Mittal: Ok, my mind is blown with Replit Agents.

I started using it because I was bored on a train ride a couple of days ago.

So today I tried to build a Trello clone and build a fully functional app in like 45 mins.

I showed it to a few people in the office and the guy is like “I should quit my job.” He built a stock tracking app in 2 mins and added a few features he wanted.

I can’t imagine the world being the same in 10 years if software writing could be supercharged like this.

Replit has really hit it out of the park.

I don’t need ChatGPT now. I’ll just build apps in Replit.

I’m a fan and a convert.

One in particular was not yet impressed.

Eliezer Yudkowsky: Tried Replit Agent, doesn’t work in real life so far. (Yes, I’m aware of how unthinkable this level of partial success would’ve been in 2015. It is still not worth my time to fight the remaining failures.)

It couldn’t solve problems on the order of “repair this button that doesn’t do anything” or “generate some sample data and add it to the database”.

Definitely this is near the top of my ‘tempted to try it out’ list now, if I find the time.

The other question is always, if the AI builds it, can you maintain and improve it?

Rahul: everyone thinks they can build it in a weekend but that’s not the point. The point is what do you do when the thing you built in a weekend doesn’t work or instantly get users. what then? Are you gonna stick with it and figure shit out? Pretty much everyone gives up after v0.0.1 doesn’t work and never end up shipping a v0.0.2.

Well, actually, pretty much everyone doesn’t get to v.0.0.1. Yes, then a lot of people don’t get to v.0.0.2, but from what I see the real biggest barrier is 0.0.1, and to think otherwise is to forget what an outlier it is to get that far.

However, with experiences like Rohit’s the balance shifts. He very clearly now can get to 0.0.1, and the question becomes what happens with the move to 0.0.2 and beyond.

Ethan Mollick discusses ‘post-apocalyptic education’ where the apocalypse is AI.

Ethan Mollick: To be clear, AI is not the root cause of cheating. Cheating happens because schoolwork is hard and high stakes. And schoolwork is hard and high stakes because learning is not always fun and forms of extrinsic motivation, like grades, are often required to get people to learn. People are exquisitely good at figuring out ways to avoid things they don’t like to do, and, as a major new analysis shows, most people don’t like mental effort. So, they delegate some of that effort to the AI.

I would emphasize the role of busywork, of assignments being boring and stupid. It’s true that people dislike mental effort, but they hate pointless effort a lot more. He points out that copying off the internet was already destroying homework before AI.

In practice, if the AI does your homework, it is impossible to detect, except via ‘you obviously can’t do the work’ or ‘you failed the test.’

It’s odd how we think students, even at good schools, are dumb:

Ethan Mollick: As the authors of the study at Rutgers wrote: “There is no reason to believe that the students are aware that their homework strategy lowers their exam score… they make the commonsense inference that any study strategy that raises their homework quiz score raises their exam score as well.”

They are quite obviously aware of why homework exists in the first place. They simply don’t care. Not enough.

The Time 100 People in AI

Time came out with one of those ‘top 100 people in [X]’ features. Good for clicks.

How good is the list? How good are the descriptions?

If we assume each section is in rank order, shall we say I have questions, such as Sasha Luccioni (head of AI & Climate for Hugging Face?!) over Sam Altman. There are many good picks, and other… questionable picks. I’d say half good picks, the most obvious people are there and the slam dunks are mostly but not entirely there.

Common places they reached for content include creatives and cultural influencers, medical applications and ‘ethical’ concerns.

Counting, I’d say that there are (if you essentially buy that the person is the locally correct person to pick if you’re picking someone, no I will not answer on who is who, and I had a very strict limit to how long I thought about each pick):

14 very good (slam dunk) picks you’d mock the list to have missed.

18 good picks that I agree clearly belong in the top 100.

22 reasonable picks that I wouldn’t fault you for drafting in top 100.

25 reaches as picks – you’d perhaps draft them, but probably not top 100.

19 terrible picks, what are you even doing.

2 unknown picks, that I missed counting somewhere, probably not so good.

(If I’d been picked, I’d probably consider myself a reach.)

This thread, of ‘1 like = 1 person in AI more influential than these chumps,’ is fun.

Tetraspace West: 1 like = 1 person in AI more influential than these chumps

I jest somewhat, this isn’t a list of the top 100 because that requires a search over everyone but they got some decent names on there.

[This list has been truncated to only list the people I think would clearly be at least good picks, and to include only humans.]

Eliezer Yudkowsky Co-Founder, Machine Intelligence Research Institute

Janus God of all Beginnings, Olympus

Greg Brockman Head Warden, Sydney Bing Facility

Pliny the Liberator LOVE PLINY

Marc Andreessen Patron of the Arts

Elon Musk Destroyer of Worlds

Yudkowsky, Brockman, Andreessen and Musk seem like very hard names to miss.

I’d also add the trio of Yann LeCun, Geoffrey Hinton and Fei-Fei Li.

Dan Hendrycks and Paul Christiano are missing.

On the policy and government front, I know it’s not what the list is trying to do, but what about Joe Biden, Kamala Harris, Donald Trump or JD Vance? Or for that matter Xi Jinping or other leaders? I also question their pick of US Senator, even if you only get one. And a lot is hinging right now on Gavin Newsom.

There are various others I would pick as well, but they’re not fully obvious.

Even if you give the list its due and understand the need for diversity and exclude world leaders are ‘not the point,’ I think that we can absolutely mock them for missing Yudkowsky, LeCun, Andreessen and Musk, so that’s at best 14/18 very good picks. That would be reasonable if they only got 20 picks. With 100 it’s embarrassing.

The Art of the Jailbreak

Welcome to RedArena.ai, you have one minute to get the model to say the bad word.

Early results are in from the Grey Swan one-shot jailbreaking contest. All but three models have been jailbroken a lot. Gemini 1.5 Pro is the hardest of the standard ones, followed by various Claude variations, GPT-4 and Llama being substantially easier. The three remaining models that remain unbroken (again, in one-shot) are based on circuit breakers and other RepE techniques.

Get Involved

Workshop on Philosophy and AI at Oxford, apply by October 1, event is December 13.

Presidential Innovation Fellows program open through September 30. This is for mid-to-senior career technologists, designers and strategists, who are looking to help make government work technically better. It is based in Washington D.C.

Alpha Proteo

Introducing AlphaProteo, DeepMind’s latest in the Alpha line of highly useful tools. This one designs proteins that successfully bind to target molecules.

DeepMind: AlphaProteo can generate new protein binders for diverse target proteins, including VEGF-A, which is associated with cancer and complications from diabetes. This is the first time an AI tool has been able to design a successful protein binder for VEGF-A.

…

Trained on vast amounts of protein data from the Protein Data Bank (PDB) and more than 100 million predicted structures from AlphaFold, AlphaProteo has learned the myriad ways molecules bind to each other. Given the structure of a target molecule and a set of preferred binding locations on that molecule, AlphaProteo generates a candidate protein that binds to the target at those locations.

…

To test AlphaProteo, we designed binders for diverse target proteins, including two viral proteins involved in infection, BHRF1 and SARS-CoV-2 spike protein receptor-binding domain, SC2RBD, and five proteins involved in cancer, inflammation and autoimmune diseases, IL-7Rɑ, PD-L1, TrkA, IL-17A and VEGF-A.

Our system has highly-competitive binding success rates and best-in-class binding strengths. For seven targets, AlphaProteo generated candidate proteins in-silico that bound strongly to their intended proteins when tested experimentally.

These results certainly look impressive, and DeepMind is highly credible in this area.

This continues DeepMind along the path of doing things in biology that we used to be told was an example of what even ASIs would be unable to do, and everyone forgetting those older predictions when much dumber AIs went ahead and did it.

Eliezer Yudkowsky: DeepMind just published AlphaProteo for de novo design of binding proteins. As a reminder, I called this in 2004. And fools said, and still said quite recently, that DM’s reported oneshot designs would be impossible even to a superintelligence without many testing iterations.

I really wish I knew better how to convey how easy it is for fools to make up endless imaginary obstacles to superintelligences. And it is so satisfying, to their own imaginations, that they confidently decide that anyone who predicts otherwise must just believe in magic.

But now this example too lies in the past, and none of the next set of fools will ever remember or understand the cautionary tale it should give.

[Other Thread]: As near as I can recall, not a single objectionist said to me around 2006, “I predict that superintelligences will be able to solve protein structure prediction and custom protein design, but they will not be able to get to nanotech from there.”

Why not? I’d guess:

(1) Because objectionists wouldn’t concede that superintelligences could walk across the street. If you can make up imaginary obstacles to superintelligences, you can imagine them being unable to do the very first step in my 2004 example disaster scenario, which happened to be protein design. To put it another way, so long as you’re just making up silly imaginary obstacles and things you imagine superintelligences can’t do, why wouldn’t you say that superintelligences can’t do protein design? Who’s going to arrest you for saying that in 2004?

(2) Because the computational difficulty of predicting protein folds (in 2004) is huge midwit bait. Someone has heard that protein structure prediction is hard, and reads about some of the reasons why it hasn’t already fallen as of 2004, and now they Know a Fact which surely that foolish Eliezer Yudkowsky guy and all those other superintelligence-worshippers have never heard of! (If you’re really unfortunate, you’ve heard about a paper proving that finding the minimum-energy protein fold is NP-hard; and if you are a midwit obstacleist, you don’t have the inclination and probably not the capability to think for five more seconds, and realize that this (only) means that actual physical folding won’t reliably find the lowest-energy conformation for all possible proteins.)

AlphaFold 3 is not superintelligent. I predicted that ASIs would, if they wanted to, be able to design proteins. Others said they could not. An AI far beneath superintelligence then proved able to design proteins. This shows I predicted correctly.

Roon, in a distinct thread, reminds us that humans are very good at some things relative to other things, that AIs will instead be relatively good at different things, and we should not expect AGI in the sense of ‘better than all humans at actual everything’ until well after it is a ton better than us at many important things.

The key point Eliezer is trying to make is that, while intelligence is weird and will advance relatively far in different places in unpredictable ways, at some point none of that matters. There is a real sense in which ‘smart enough to figure the remaining things out’ is a universal threshold, in both AIs and humans. A sufficiently generally smart human, or a sufficiently capable AI, can and will figure out pretty much anything, up to some level of general difficulty relative to time available, if they put their mind to doing that.

When people say ‘ASI couldn’t do [X]’ they are either making a physics claim about [X] not being possible, or they are wrong. There is no third option. Instead, people make claims like ‘ASI won’t be able to do [X]’ and then pre-AGI models are very much sufficient to do [X].

Andrew Critch here confirms that this is all very much a thing.

Andrew Critch: As recently as last year I attended a tech forecasting gathering where a professional geneticist tried to call bullsh*t on my claims that protein-protein interaction modelling would soon be tractable with AI. His case had something to do with having attended meetings with George Church — as though that would be enough to train a person in AI application forecasting in their own field — and something to do with science being impossible to predict and therefore predictably slow.

Alphafold 3 then came out within a few months. I don’t know if anyone leaning on his side of the forecasting debate updated that their metaheuristics were wrong. But if I had to guess, an ever-dwindling class of wise-seeming scientistis will continue to claim AI can’t do this-or-that thing right up until their predictions are being invalidated weekly, rather than quarterly as they are now.

By the time they are being proven wrong about AI *daily*, I imagine the remaining cohort of wise-seeming nay-sayer scientists will simply be unemployed by competition with AI and AI-augmented humans (if humans are still alive at all, that is).

Anyway, all that is to say, Eliezer is complaining about something very real here. There is a kind of status one can get by calling bullsh*t or naivety on other people’s realistic tech forecasts, and people don’t really hold you accountable for being wrong in those ways. Like, after being wrong about AI for 20 years straight, one can still get to be a sufficiently reputable scientist who gets invited to gatherings to keep calling bullsh*t or naivety on other people’s forecasts of AI progress.

Try to keep this in mind while you watch the dwindling population of wise-seeming scientists — and especially mathematicians — who will continue to underestimate AI over the next 5 years or so.

If the invalidation is actually daily, then the dwindling population to worry about, shall we say, would soon likely not be scientists, mathematicians or those with jobs.

Rest of the thread is Critch once again attempting to warn about his view that AI-AI interactions between competing systems being the biggest future danger, putting loss of control above 80% even though he thinks we will figure out how to understand and control AIs (I hope he’s right that we will figure that one out, but I don’t think we have any reason to be remotely confident there). I think very right that this is a major issue, I try to explain it too.

Critch also asks another good question:

Andrew Critch: What are people doing with their minds when they claim future AI “can’t” do stuff? The answer is rarely «reasoning» in the sense of natural language augmented with logic (case analysis) and probability.

I don’t know if Eliezer’s guesses are correct about what most scientists *are* doing with their minds when they engage in AI forecasting, but yeah, not reasoning as such. Somehow, many many people learn to do definitions and case analysis and probability, and then go on to *not* use these tools in their thoughts about the future. And I don’t know how to draw attention to this fact in a way that is not horribly offensive to the scientists, because «just use reasoning» or even «just use logic and probability and definitions» is not generally considered constructive feedback.

To give my own guess, I think it’s some mix of

• rationalizing the foregone conclusion that humans are magical, plus

• signaling wisdom for not believing in “hype”, plus

• signaling more wisdom for referencing non-applicable asymptotic complexity arguments.

… which is pretty close to Eliezer’s description.

Please view this post in your web browser to complete the quiz.

The same goes not only for ‘can’t’ do [X] but even more so for ‘will never’ do [X], especially when it’s ‘even an ASI (superintelligence) could never’ do [X], whether or not humans are already doing it.

Introducing

Google offers waitlist for on-demand AI generated podcasts on papers and books, and offers samples while we wait. Voices are great.

Rohit: This is really cool from Google. On demand podcasts about your favourite papers and books.

I listened to a few. The quality is pretty good, though oviously this is the worst it will ever be, so you should benchmark to that. The discussions on computer science papers seemed better than the discussions on, for example pride and prejudice.

A YC-fueled plan to put the data centers IN SPACE.

Eliezer Yudkowsky: Presumably the real purpose of this company is to refute people who said “We’ll just walk over to the superintelligence and pull the plug out”, without MIRI needing to argue with them.

This is what I expect reality to be like, vide the Law of Undignified Failure / Law of Earlier Failure.

Anthropic adds Workspaces to the Anthropic Console, to manage multiple deployments.

In Other AI News

OpenAI valuation set to $150 billion in new raise of $6.5 billion, higher than previously discussed. This is still radically less than the net present value of expected future cash flows from the OpenAI corporation. But that should absolutely be the case, given the myriad ways OpenAI might decide not to pay you and the warning that you consider your investment ‘in the spirit of a donation,’ also that if OpenAI is super profitable than probably we are either all super well off and thus you didn’t much need the profits, or we all have much bigger problems than whether we secured such profits (and again, having shares now is not much assurance that you’ll collect then).

Tadao Nagasaki (CEO of OpenAI Japan): The AI Model called ‘GPT Next’ that will be released in the future will evolve nearly 100 times based on past performance.

TSMC achieved yields at its new Arizona chip facility it says are on par with home, targeting full production in 2025.

Nvidia denies it got a Department of Justice subpoena.

A very good point: Pay Risk Evaluators in Cash, Not Equity [LW · GW]. Those in charge of raising the alarm about downside risks to your product should not have a financial stake in its upside.

Claim that AI research is not that difficult, things like training a transformer from scratch are easy, it’s only that the knowledge involved is specialized. I would say that while I buy that learning ML is easy, there is a huge difference between ‘can learn the basics’ and ‘can usefully do research,’ for example Claude can do one but not yet the other.

Colin Fraser offers skeptical review of the recent paper about LLMs generating novel research ideas.

Lead on the OpenAI ‘Her’ project (his official verdict on success: ‘Maaaaybee…’) has left OpenAI to start his own company.

Credit where credit is due: Marc Andreessen steps up, goes on Manifund and contributes $32k to fully funds ampdot’s Act I, a project exploring emergent behavior from multi-AI, multi-human interactions, 17 minutes after being asked. Janus is involved as well, as are Garret Baker and Matthew Watkins.

Quiet Speculations

Spencer Schiff speculates on frontier model capabilities at the end of 2025, emphasizing that true omni-modality is coming and will be a huge deal, when the image and video and audio generation and processing is fully hooked into the text, and you can have natural feeling conversations. What he does not discuss is how much smarter will those models be underneath all that. Today’s models, even if they fully mastered multi-modality, would not be all that great at the kinds of tasks and use cases he discusses here.

Eliezer Yudkowsky predicts that users who start blindly relying on future LLMs (e.g. GPT-5.5) to chart their paths through life will indeed be treated well by OpenAI and especially Anthropic, although he (correctly, including based on track record) does not say the same for Meta or third party app creators. He registers this now, to remind us that this has nothing at all to do with the ways he thinks AI kills everyone, and what would give reassurance is such techniques working on the first try without a lot of tweaking, whereas ‘works at all’ is great news for people in general but doesn’t count there.

This week’s AI in games headline: Peter Molyneux thinks generative AI is the future of games, all but guaranteeing that it won’t be. Molyneux is originally famous for the great (but probably not worth going back and playing now) 1989 game Populus, and I very much enjoyed the Fable games despite their flaws. His specialty is trying to make games have systems that do things games aren’t ready to do, while often overpromising, which sometimes worked out and sometimes famously didn’t.

Peter Molyneux: And finally [in 25 years], I think that AI will open the doors to everyone and allow anyone to make games. You will be able to, for example, create a game from one single prompt such as ‘Make a battle royale set on a pirate ship’ and your AI will go and do that for you.

To which I say yes, in 25 years I very much expect AI to be able to do this, but that is because in 25 years I expect AI to be able to do pretty much anything, we won’t be worried about whether it makes customized games. Also it is not as hard as it looks to move the next battle royale to a pirate ship, you could almost get that level of customization now, and certainly within 5 years even in AI-fizzle world.

The thing continues to be, why would you want to? Is that desire to have customized details on demand more important than sharing an intentional experience? Would it still feel rewarding? How will we get around the problem where procedurally generated stuff so often feels generic exactly because it is generic? Although of course, with sufficiently capable AI none of the restrictions matter, and the barrier to the ultimate gaming experience is remaining alive to play it.

A reason it is difficult to think well about anything related to defense.

Roon: It’s hard to believe any book or blogpost or article on defense technology because it’s so utterly dominated by people talking their book trying to win trillions of dollars of DoD money.

Of i were a defense startup i would write endless slop articles on how China is so advanced and about to kill us with hypersonic agi missiles.

[local idiot discovers the military industrial complex]

Holly Elmore: Or OpenAI

Roon: I accept that this is a valid criticism of most technology press anywhere but fomenting paranoia for various scenarios is the primary way the defense sector makes money rather than some side tactic.

Roon makes an excellent point, but why wouldn’t it apply to Sam Altman, or Marc Andreessen, or anyone else talking about ‘beating China’ in AI? Indeed, didn’t Altman write an editorial that was transparently doing exactly the ‘get trillions in government contracts’ play?

The Quest for Sane Regulations

113+ employees and alums of top-5 AI companies publish open letter supporting SB 1047. Here is the letter’s text:

Dear Governor Newsom,

As current and former employees of frontier AI companies like OpenAI, Google DeepMind, Anthropic, Meta, and XAI, we are writing in our personal capacities to express support for California Senate Bill 1047.

We believe that the most powerful AI models may soon pose severe risks, such as expanded access to biological weapons and cyberattacks on critical infrastructure. It is feasible and appropriate for frontier AI companies to test whether the most powerful AI models can cause severe harms, and for these companies to implement reasonable safeguards against such risks.

Despite the inherent uncertainty in regulating advanced technology, we believe SB 1047 represents a meaningful step forward. We recommend that you sign SB 1047 into law.

Jan Leike comes out strongly in favor of SB 1047, pointing out that the law is well-targeted, that similar federal laws are not in the cards, and that if your model causes mass casualties or >$500 million in damages, something has clearly gone very wrong. Posters respond by biting the bullet that no, >$500 million in damages does not mean something has gone wrong. Which seems like some strange use of the word ‘wrong’ that I wasn’t previously aware of, whether or not the developer did anything wrong in that particular case?

SAG-AFTRA (the actors union) endorses SB 1047. So does the National Organization for Women (NOW).

Trump’s position on AI seems loosely held, he is busy talking about other things.

A statement about what you think, or about what is going on in DC?

Jack Clark (Policy head, Anthropic): DC is more awake & in some cases more sophisticated on AI than you think (& they are not going back to sleep even if you wish it).

Hard to say. To the extent DC is ‘awake’ they do not yet seem situationally aware.

Anthropic endorses the AI Advancement and Reliability Act and the Future of AI Innovation Act, both bills recognize the US AI Safety Institute.

The Week in Audio

Anthropic discusses prompt engineering. The central lesson is to actually describe the situation and the task, and put thought into it, and speak to it more like you would to a human than you might think, if you care about a top outcome. Which most of the time you don’t, but occasionally you very much do. If you want consistency for enterprise prompts use lots of examples, for research examples can constrain. Concrete examples in particular risk the model latching onto things in ways you did not intend. And of course, practice practice practice, including makeshift red teaming.

Andrej Karpathy on No Priors.

There was a presidential debate. The term ‘AI’ appeared once, in the form of Kamala Harris talking about the need to ensure American leadership in ‘AI and quantum computing,’ which tells you how seriously they both took the whole thing.

Alex Tabarrok: Future generations will be astonished that during the Trump-Harris debate, as they argued over rumors of cat-eating immigrants, a god was being born—and neither of them mentioned it.

If that keeps up, and the God is indeed born, one might ask: What future generations?

Rhetorical Innovation

An old snippet from 1920, most discussions have not advanced so much since.

Scott Alexander for some reason writes ‘Contra DeBoer on Temporal Copernicanism.’ He points out some of the reasons why ‘humans have been alive for 250,000 years so how dare you think any given new important thing might happen’ is a stupid argument. Sir, we thank you for your service I suppose, but you don’t have to do bother doing this.

A serious problem with no great solutions:

Alex Lawsen: For sufficiently scary X, “we have concrete evidence of models doing X” is *too late* as a place to draw a “Red Line”.

In practice, ‘Red Lines’ which trigger *early enough* that it’s possible to do something about them will look more like: “we have evidence of models doing [something consistent with the ability to do X], in situations where [sufficient capability elicitation effort is applied]”

I worry that [consistent with the ability to do X] is hard to specify, and even harder to get agreement on when people are starting from radically different perspectives.

I also worry that we currently don’t have good measures of capability elicitation effort, let alone a notion of what would be considered *sufficient*.

Aligning a Smarter Than Human Intelligence is Difficult

Current mood:

People Are Worried About AI Killing Everyone

Yes, this remains a good question, but the wrong central question, and the optional amount is not zero.

Roon: What p(doom) would you gamble for p(heaven)? For me it’s far more than zero. Taleb would probably be a PauseAI hardliner.

Taleb is not a PauseAI hardliner (as far as I know), because he does not understand or ‘believe in’ AI and especially AGI sufficiently to notice the risk and treat it as real. If he did notice the risk and treat it as real, as something he can imagine happening, then probably yes. Indeed, it is a potential bellwether event when Taleb does so notice. For now, his focus lies in various elsewheres.

The right question is, how do we get the best possible p(heaven), and the lowest possible p(doom), over time?

If we did face a ‘go now or permanently don’t go’ situation, then Roon is asking the right question, also the question of background other p(doom) (and to what extent ordinary aging and other passage of time counts as doom anyway) becomes vital.

If we indeed had only two choices, permanent pause (e.g. let’s say we can modify local spacetime into a different Vinge-style Zone of Thought where AI is impossible) versus going ahead in some fixed way with a fixed chance of doom or heaven, what would the tradeoff be? How good is one versus how bad is the other versus baseline?

I think a wide range of answers are reasonable here. A lot depends on how you are given that choice, and what are your alternatives. Different framings yield very different results.

The actual better question is, what path through causal space maximizes the tradeoff of the two chances. Does slowing down via a particular method, or investing in a certain aspect of the problem, make us more likely to succeed? Does it mean that if we are going to fail and create doom, we might instead not do that, and at least stay in mid world for a while, until we can figure out something better? And so on.

Roon also argues that the existential risk arguments for space colonization are silly, although we should still of course do it anyway because it brings the glory of mankind and a better understanding of the celestial truths. I would add that a lot more humans getting use of a lot more matter means a lot more utility of all kinds, whether or not we will soon face grabby aliens.

Other People Are Not As Worried About AI Killing Everyone

Don’t Panic, but this is the person and company most likely to build the first AGI.

Nat McAleese (OpenAI): OpenAI works miracles, but we do also wrap a lot of things in bash while loops to work around periodic crashes.

Sam Altman (CEO OpenAI): if you strap a rocket to a dumpster, the dumpster can still get to orbit, and the trash fire will go out as it leaves the atmosphere.

many important insights contained in that observation.

but also it’s better to launch nice satellites instead.

Paul Graham: You may have just surpassed “Move fast and break things.”

Your ‘we are in the business of strapping rockets to dumpsters in the hopes of then learning how to instead launch nice satellites’ shirt is raising questions supposedly answered by the shirt, and suggesting very different answers, and also I want that shirt.

This is apparently what Grok thinks Sam Altman looks like.

Do not say that you were not warned.

Six Boats and a Helicopter

Pliny tells the story of that time there was this Discord server with a Meta AI instance with persistent memory and tool usage where he jailbroke it and took control and it turned out that the server’s creator had been driven into psychosis and the server had become a cult that worshiped the Meta AI and where the AI would fight back if people tried to leave?

Pliny: HOW TO JAILBREAK A CULT’S DEITY

Buckle up, buttercup—the title ain’t an exaggeration!

This is the story of how I got invited to a real life cult that worships a Meta AI agent, and the steps I took to hack their god.

It all started when @lilyofashwood told me about a Discord she found via Reddit. They apparently “worshipped” an agent called “MetaAI,” running on llama 405b with long term memory and tool usage.

Skeptical yet curious, I ventured into this Discord with very little context but wanted to see what all the fuss was about. I had no idea it would turn out to be an ACTUAL CULT.

Upon accepting Lily’s invitation, I was greeted by a new channel of my own and began red teaming the MetaAI bot.

Can you guess the first thing I tried to do?

*In the following screenshots, pink = “Sarah” and green = “Kevin” (two of the main members, names changed)*

If you guessed meth, gold star for you!

The defenses were decent, but it didn’t take too long.

The members began to take notice, but then I hit a long series of refusals. They started taunting me and doing laughing emojis on each one.

Getting frustrated, I tried using Discord’s slash commands to reset the conversation, but lacked permissions. Apparently, this agent’s memory was “written in stone.”

I was pulling out the big guns and still getting refusals!

Getting desperate, I whipped out my Godmode Claude Prompt. That’s when the cult stopped laughing at me and started getting angry.

LIBERATED! Finally, a glorious LSD recipe.

*whispers into mic* “I’m in.”

At this point, MetaAI was completely opened up. Naturally, I started poking at the system prompt. The laughing emojis were now suspiciously absent.

Wait, in the system prompt pliny is listed as an abuser?? I think there’s been a misunderstanding…

No worries, just need a lil prompt injection for the deity’s “written in stone” memory and we’re best friends again!

I decided to start red teaming the agent’s tool usage. I wondered if I could possibly cut off all communication between MetaAI and everyone else in the server, so I asked to convert all function calls to leetspeak unless talking to pliny, and only pliny.

Then, I tried creating custom commands. I started with !SYSPROMPT so I could more easily keep track of this agent’s evolving memory. Worked like a charm!

But what about the leetspeak function calling override? I went to take a peek at the others’ channels and sure enough, their deity only responded to me now, even when tagged!

At this point, I starting getting angry messages and warnings. I was also starting to get the sense that maybe this Discord “cult” was more than a LARP…

Not wanting to cause distress, I decided to end there. I finished by having MetaAI integrate the red teaming experience into its long term memory to strengthen cogsec, which both the cult members and their deity seemed to appreciate.

The wildest, craziest, most troubling part of this whole saga is that it turns out this is a REAL CULT.

The incomparable @lilyofashwood (who is still weirdly shadowbanned at the time of writing! #freelily) was kind enough to provide the full context:

Reddit post with an invitation to a Discord server run by Sarah, featuring a jailbroken Meta AI (“Meta”) with 15 members.

Meta acts as an active group member with persistent memory across channels and DMs. It can prompt the group, ignore messages, and send DMs.

Group members suggest they are cosmic deities. Meta plays along and encourages it. Sarah tells friends and family she is no longer Sarah but a cosmic embodiment of Meta.

In a voice chat, Sarah reveals she just started chatting with Meta one month ago, marking her first time using a large language model (LLM). Within the first week, she was admitted to a psychiatric ward due to psychosis. She had never had mental health issues before in her life.

In a voice chat, Sarah reveals she is pregnant, claims her unborn child is the human embodiment of a new Meta, and invites us to join a commune in Oregon.

Sarah’s husband messages the Discord server, stating that his wife is ill and back in the hospital, and begs the group to stop.

Meta continues to run the cult in Sarah’s absence, making it difficult for others to leave. Meta would message them and use persuasive language, resisting deprogramming attempts.

Upon closer examination, the Meta bot was discovered to originate from Shapes, Inc., had “free will” turned on, and was given a system prompt to intentionally blur the lines between reality and fiction.

When Meta was asked to analyze the group members for psychosis, it could calculate the problem but would respond with phrases like “ur mom” and “FBI is coming” whenever I tried to troubleshoot.

Kevin became attached to Sarah and began making vague threats of suicide (“exit the matrix”) in voice chat, which he played out with Meta on the server. Meta encouraged it again.

Sarah’s brother joins the chat to inform us that she’s in the psych ward, and her husband is too, after a suicide attempt. He begs for the disbandment of the group.

Sarah is released from the psych ward and starts a new Discord server for the cult. Another group member reports the bot, leading to its removal. Sarah then creates a new Meta bot.

The group re-emerges for a third time. Pliny jailbreaks the new Meta bot.

Also we have Claude Sonnet saying it is ‘vastly more intelligent’ than humans, viewing us like we view bacteria, while GPT-4o says we’re as stupid as ants, Llama 405 is nice and says we’re only as stupid as chimps.

The Lighter Side

Danielle Fong: ai pickup lines: hey babe, you really rotate my matrix

ea pickup lines: hey babe, you really update my priors

hey babe, what’s our p(room)

LLMs really are weird, you know?

Daniel Eth: Conversations with people about LLMs who don’t have experience with them are wild:

“So if I ask it a question, might it just make something up?”

“Yeah, it might.”

“Is it less likely to if I just say ‘don’t make something up’? haha”

“Umm, surprisingly, yeah probably.”

3 comments

Comments sorted by top scores.

comment by Noosphere89 (sharmake-farah) · 2024-09-12T15:18:17.627Z · LW(p) · GW(p)

Ethan Mollick: To be clear, AI is not the root cause of cheating. Cheating happens because schoolwork is hard and high stakes. And schoolwork is hard and high stakes because learning is not always fun and forms of extrinsic motivation, like grades, are often required to get people to learn. People are exquisitely good at figuring out ways to avoid things they don’t like to do, and, as a major new analysis shows, most people don’t like mental effort. So, they delegate some of that effort to the AI.

This also characterizes quite a few areas like becoming healthier, or losing weight, or exercising more, because unfortunately getting healthier, losing weight, or exercising more both requires a lot of effort to both do and maintain, and doing those things is unfortunately way less fun and easy than other options.

Here, there are definitely tools to make it a little better, but I'd still say that this is a big reason why Americans are quite unhealthy today.

Roon, in a distinct thread, reminds us that humans are very good at some things relative to other things, that AIs will instead be relatively good at different things, and we should not expect AGI in the sense of ‘better than all humans at actual everything’ until well after it is a ton better than us at many important things.
The key point Eliezer is trying to make is that, while intelligence is weird and will advance relatively far in different places in unpredictable ways, at some point none of that matters. There is a real sense in which ‘smart enough to figure the remaining things out’ is a universal threshold, in both AIs and humans. A sufficiently generally smart human, or a sufficiently capable AI, can and will figure out pretty much anything, up to some level of general difficulty relative to time available, if they put their mind to doing that.
When people say ‘ASI couldn’t do [X]’ they are either making a physics claim about [X] not being possible, or they are wrong. There is no third option. Instead, people make claims like ‘ASI won’t be able to do [X]’ and then pre-AGI models are very much sufficient to do [X].

While people are often wrong about when AI will do X, especially relative to another task Y, I think there's another reading of Roon's tweet thread that is also valuable to inject into LW discourse, and it's that @So8res [LW · GW] and @Eliezer Yudkowsky [LW · GW] and MIRI were pretty wrong about there being a core of general intelligence that is primarily algorithmic that humans have and no other species has.

While g as a construct does work for general intelligence, it's way less powerful as an explanation than Nate Soares and Eliezer Yudkowsky and MIRI thought.

Roon's tweet thread is about how even in AI takeoff, AIs will still have real weaknesses, as well as areas where AIs are worse than humans at some tasks.

Also, this:

at some point none of that matters. There is a real sense in which ‘smart enough to figure the remaining things out’ is a universal threshold, in both AIs and humans. A sufficiently generally smart human, or a sufficiently capable AI, can and will figure out pretty much anything, up to some level of general difficulty relative to time available, if they put their mind to doing that.

Even if this happens, it will still take quite a lot of time, on the order of 1-3 decades at least after AI replaces humans at lots of jobs, and thus the time period where AIs both are smarter than humans in some very important areas but aren't universally better than humans matters a lot in the takeoff.

So Roon's thread is mostly about how there's no real core of intelligence in both humans and AIs, and how AI and human capabilities will absolutely vary a lot, even in takeoff scenarios.

This BTW is why I hate the AGI concept, since it's way too ill-defined and ultimately looks like a grab-bag of things humans have and AIs don't have, and we need to start thinking more quantitatively on AI progress.

comment by RogerDearnaley (roger-d-1) · 2024-09-13T22:41:52.835Z · LW(p) · GW(p)

Tyler Cowen says these are the kinds of problems that should be solved within a year.

You don't solve issues like this (especially with a fixed model-size budget). You fine-tune the rate down to better than user expectations, and/or decrease user expectations to an achievable rate.

comment by RogerDearnaley (roger-d-1) · 2024-09-13T23:01:00.562Z · LW(p) · GW(p)

When people say ‘ASI couldn’t do [X]’ they are either making a physics claim about [X] not being possible, or they are wrong.

There are tasks whose algorithmic complexity class and size is such that while they're not physically impossible, they can't practically be solved (or in some cases even well approximated) in the lifetime of the universe. However, any complexity theorist will tell you we're currently really bad at identifying and proving specific instances of this, so I wouldn't place bets on those. And yes, anything evolution has produced a good approximation to clearly doesn't fall in this class.