Why don't we currently have AI agents?

post by ChristianKl · 2024-12-26T15:26:35.682Z · LW · GW · 2 comments

This is a question post.

Contents

  Answers
    16 Matt Goldenberg
    7 Buck
    4 CBiddulph
None
2 comments

Intuitively, the AutoGPT concept sounds like it should be useful if a company invests in it. Yet, all the big publically available systems are seem to be chat interfaces where the human writes a messages and then the computer writes another message.

Even if AutoGPT-driven by an LLM alone wouldn't achieve all ends, a combination where a human could oversee the steps and shepard AutoGPT, could likely be very productive.

The idea sounds to me like it's simple enough that people at big companies should have considered it. Why isn't something like that deployed?

Answers

answer by Matt Goldenberg · 2024-12-26T20:19:55.650Z · LW(p) · GW(p)

When you start trying to make an agent, you realize how much your feedback, rerolls, etc are making chat based llms useful

the error correction mechanism is you in a chat based llms, and in the absence of that, it's quite easy for agents to get off track

you can of course add error correction mechanism like multiple llms checking each other, multiple chains of thought, etc, but the cost can quickly get out of hand

comment by ChristianKl · 2024-12-27T10:01:01.145Z · LW(p) · GW(p)

Is answer assumes that you either have a fully chat based version or one that operates fully autonomous.

You could build something in the middle where every step of the agent gets presented to a human who can press next or correct the agent. An agent might even propose multiple ways forward and let the human decide. That then produces the training data for the agent to get better in the future.

Replies from: mr-hire
comment by Matt Goldenberg (mr-hire) · 2024-12-27T14:51:53.888Z · LW(p) · GW(p)

This exists and is getting more popular, especially with coding, but also in other verticals

Replies from: ChristianKl
comment by ChristianKl · 2024-12-28T01:19:22.091Z · LW(p) · GW(p)

Which one's do you see as the top ones?

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-12-29T03:07:02.132Z · LW(p) · GW(p)

I've been using Aider recently with coding. It's a mixed bag, but overall I think I like it. You can configure whether it just acts, or asks for permission first.

answer by Buck · 2024-12-26T16:58:51.402Z · LW(p) · GW(p)

I have an AI agent that wrote myself; I use it on average 5x per week over the last 6 months. I think it's moderately useful. I mostly use it for simple shell tasks that would otherwise require copy-pasting back and forth with claude.ai.

My guess is that the big AI companies don't think the market for this is big enough to be worth making a product out of it.

comment by Leon Lang (leon-lang) · 2024-12-26T22:28:08.617Z · LW(p) · GW(p)

I have an AI agent that wrote myself

Best typo :D

answer by CBiddulph · 2024-12-26T16:04:12.616Z · LW(p) · GW(p)

Anthropic's computer use model and Google's Deep Research both do this. Training systems like this to work reliably has been a bottleneck to releasing them

2 comments

Comments sorted by top scores.

comment by Gordon Seidoh Worley (gworley) · 2024-12-26T20:36:18.661Z · LW(p) · GW(p)

I can't help but wonder if part of the answer is that they seem dangerous and people are selecting out of producing them.

Like I'm not an expert but creating AI agents seems extremely fun and appealing, and I'm intentionally working on it none because it seems safer not to build them. (Whether you think my contributions to trying to build them would matter or not is another question.)

comment by Sodium · 2024-12-29T08:13:22.465Z · LW(p) · GW(p)

I think the actual answer is: the AI isn't smart enough and trips up a lot.

But I haven't seen a detailed write up anywhere that talks about why the AI trips up and what are the types of places where it trips up. It feels like all of the existing evals work optimize for legibility/reproducibility/being clearly defined. As a result, it's not measuring the one thing that I'm really interested in: why don't we have AI agents replacing workers. I suspect that some startup's internal doc on "why does our agent not work yet" would be super interesting to read and track over time.