Bimodal AI Beliefs

post by Adam Train (aetrain) · 2025-02-14T06:45:53.933Z · LW · GW · 1 comments

Contents

  The Lay Experience
  Contrasts
  Implications
None
1 comment

Much is said about society's general lack of AI situational awareness. One prevailing topic of conversation in my social orbit is our ongoing bafflement about how so many other people we know, otherwise smart and inquisitive, seem unaware of or unconcerned about AI progress, x-risk, etc. This hardly seems like a unique experience.

We all can see that there's a lot of motivated reasoning nowadays, now that some industries are starting to understand that sufficiently good AI would introduce massive structural changes or render them obsolete. But the usual suspects also include things like how AI risk (existential and otherwise) flips the usual intuition about the efficiencies gained from new technologies on its head, of how difficult it is in general to imagine the future being a very different kind of world. Of course, the world does change rapidly, and to reason well about it you have to be open to ideas that initially feel weird, but these are all ideas that are not commonly discussed outside communities like this one.

I offer a more innocent explanation for why so many people seem not to grasp both the current capabilities of AI and the trajectory we're on.

The Lay Experience

Consider the experience of the median layperson. It starts when someone (a friend, ad, etc.) makes big claims about what ChatGPT can do and says that you can access those capabilities in plain English. In this way, people sign up, greet it and play around with it a bit. At some point, prompted (heh) by those big claims about AI capabilities, they try to test it by asking it increasingly tricky questions about domains they're familiar with. It does fine at first, but eventually it gets some detail wrong and the illusion of general intelligence is broken. Then the person buckets it into the cognitive category of "toy" and it's over.

Are they wrong? Well, it depends on the questions they asked. If they asked good questions and the LLM got it wrong, they found the frontier of some capability (or a hallucination). If they asked poorly formed questions and the LLM didn't know what to do, then of course it flounder, be nonspecific, or generally seem like a toy. In both cases, whether the "toy" category is correct or not in the user's chosen domain, the overconfidence of LLMs in the face of ambiguity is a genuine UX problem, particularly when reinforced by the aforementioned big claims about AI capabilities measured against the background conditions of the world not (yet) changing much. The intelligence of the product feels like marketing spin in that context.

Now let's focus specifically on AI marketing claims. Here I'm not talking about any specific company, person, or advertisement, but the tone the big AI labs and their users create around their products in the aggregate. It claims to be a tool for everyone, to provide access to specialized knowledge and to carry out complex conversation with users. It claims to be helpful for everyday tasks and to boost productivity. It does not, in any sense, suggest that you need to know how to prompt it effectively to access its strongest capabilities.

Unfortunately, you really do need to know how to do that.

This should be obvious to LLM power users who have seen the difference between the best and worst outputs. But let's keep this abstract for now.

Contrasts

Consider the contrast between the experiences of the above median layperson and of LLM power users who know what models are good & bad at and have a sense of how to craft good prompts. I'm referring to the type of prompt engineering skill that is an art and not a science. Such a user will ask better questions, and thus will get better results in a way that is at least loosely self-reinforcing as their skill grows. This is especially true if they use the outputs in the course of their job, because that probably triples the time during which using an LLM may come to mind.

There is also an effect where a power user—someone using an LLM for work, for example—can forgive the occasional hallucination, because you get better at noticing them and you get so much benefit overall. Humans make mistakes; an LLM does not need to make zero mistakes to act intelligently by human standards. It just needs to equal or improve upon the human error rate. In this way a power user is much less likely to see a hallucination and reflexively dismiss the technology versus someone with a lay perspective, even if neither one knows how LLMs work under the hood.

So in sum, the first group either bounces off the technology or doesn't know how to get the best outputs, and is more inclined to be critical. The second group embraces the technology, learns how to prompt very well, and probably becomes more forgiving of errors. Opinions about the technology will trend downward in the first group and upward in the second, in a way that strengthens over time as capabilities improve and prompting skill remains important.

In this model—independent of anyone's relative intelligence, understanding of how LLMs work, or ideas about AI alignment, gradual disempowerment, x-risk, etc.—beliefs about AI capabilities should naturally trend toward a bimodal distribution. Which group you trend into is thus a function of how much attention you pay to AI research, yes, but also of how much time you spend learning to use them, trying to get real work done.

Implications

When I see a bimodal distribution like this, I become concerned about tribalism. I don't think that's likely here any more than we already see it, because at some point—probably pretty soon—capabilities will become so impressive that lots of people will get disrupted and nobody sensible will deny the situation. The bimodal distribution will eventually collapse into a general understanding of the situation. But before that, it does have implications for how to talk about AI and advocate for controls.

For example, outside of specific types of work where LLMs are most useful like software development, we should expect that people on average will not be that skilled at prompting and thus will not personally experience the strongest capabilities of frontier models. We should expect this to remain true on average even if they try to explore those capabilities, at least until the next iteration of models releases, and probably even then because prompting does not seem like it's declining in importance yet.

Accordingly, in the short term, we should expect an increasing disconnect between the groups as capabilities improve but remain unevenly accessible.[1] As noted above, this will remain true until capabilities become undeniable—or until we get AGI, at which point we have other problems—at which point mainstream society will start really paying attention to the slope of AI progress.

Overall I think this speaks to how we are probably not well-served talking about the current value propositions of LLMs as general-purpose tools for everyone. They are that, in the sense that they can be used productively across many disciplines, but they are also not that, in the sense that the benefits are unevenly distributed toward people whose interests or incentives prime them to spend a lot of time building the skill of prompting. It is more like learning how to paint than learning to ride a bike: fundamentally it is a matter of learning and familiarity that anyone can accomplish, but many people will not choose to do so.

In the meantime, I think LLMs are better imagined and discussed as specialized tools that require finesse to use most effectively. That framing, it seems to me, sets more accurate expectations for people approaching an LLM for the first time.

  1. ^

    Note the DeepSeek r1 phenomenon as a rare time when this disconnect collapsed a bit. Its release in January was the first time many people were exposed to a CoT model, given most people only use free models, and the jump between those and r1 is credibly large even with less effective prompting.

1 comments

Comments sorted by top scores.

comment by keltan · 2025-02-14T23:26:34.215Z · LW(p) · GW(p)

I think this post points at something quite important. I might suggest adding a TLDR at the top, because the implication section is most valuable, but gets buried.

Anyhow, strong upvote from me.