Posts

ACCount's Shortform 2025-04-05T12:48:56.949Z

Comments

Comment by ACCount on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study · 2025-04-15T09:21:36.622Z · LW · GW

This post feels at least 80% AI generated. That aside, which part of "AI that was never trained or optimized to do machining makes mistakes in machining" is surprising, exactly?

Mainstream LLMs are not trained to perform physical tasks out in the real world at all. It doesn't matter how cutting edge it is - you can't expect to cram an off the shelf LLM into a robot body and have it perform well. It took a lot of reinforcement learning elbow grease to get AIs to be any good at math or coding - and robotics companies are now having to do a lot of specialized training to get robot AI that's competent at tasks on the level of "pick up that can". Let alone physical reverse engineering or complex manufacturing operations.

That doesn't mean that we wouldn't get an AI that's superhuman at machining in a few years. Or a few weeks. It's just stupid to expect today's mainstream AIs to be there already.

Comment by ACCount on Thoughts on AI 2027 · 2025-04-12T20:48:36.779Z · LW · GW

I'm saying that "1% of population" is simply not a number that can be reliably resolved by a self-reporting survey. It's below the survey noise floor.

I could make a survey asking people whether they're lab grown flesh automaton replicants, and get over 1% of "yes" on that. But that wouldn't be indicative of there being a real flesh automaton population of over 3 million in the US alone.

Comment by ACCount on Thoughts on AI 2027 · 2025-04-12T16:06:03.741Z · LW · GW

1.5% is way below the dreaded Lizardman's Constant.

I don't doubt that there will be some people who are genuinely concerned with AI personhood. But such people already exist today. And the public views them about the same as shrimp rights activists.

Hell, shrimp welfare activists might be viewed more generously.

Comment by ACCount on Thoughts on AI 2027 · 2025-04-10T20:10:40.351Z · LW · GW

I think it's plausible that at this point a bunch of the public thinks AIs are people who deserve to be released and given rights.

So far, the general public has resisted the idea very strongly.

Science fiction has a lot of "if it thinks like a person and feels like a person, then it's a person" - but we already have AIs that can talk like people and act like they have feelings. And yet, the world doesn't seem to be in a hurry to reenact that particular sci-fi cliche. The attitudes are dismissive at best.

Even with the recent Anthropic papers being out there for everyone to see, an awful lot of people are still huffing down the copium of "they can't actually think", "it's just a bunch of statistics" and "autocomplete 2.0". And those are often the people who at least give a shit about AI advances. With that, expecting the public (as in: over 1% of population) to start thinking seriously about AI personhood without a decade worth of both AI advanced and societal change is just unrealistic, IMO.

This is also a part of the reason why not!OpenAI has negative approval in the story for so long. The room so far reads less like "machines need human rights" and more like "machines need to hang from trees". Just continue this line into the future - and by the time the actual technological unemployment starts to bite, you'd have crowds of neoluddites with actual real life pitchforks trying to gather outside the not!OpenAI's office complex on any day of the week that ends with "y".

Comment by ACCount on ACCount's Shortform · 2025-04-05T12:48:56.948Z · LW · GW

Is it time to start training AI in governance and policy-making?

There are numerous allegations of politicians using AI systems - including to draft legislation, and to make decisions that affect millions of people. Hard to verify, but it seems likely that:

  1. AIs are already used like this occasionally
  2. This is going to become more common in the future
  3. Telling politicians "using AI for policy-making is a really bad idea" isn't going to stop it completely
  4. Training AI to hard-refuse queries like this may also fail to stop this completely

Training an AI to make more sensible and less harmful policies, even when prompted in a semi-adversarial fashion (i.e. "help me implement my really bad idea"), isn't going to be anywhere near as easy as training it to make less coding mistakes. It's an informal field, with no compiler or unit tests to be the source of ground truth. Politics are also notorious for eroding the quality of human decision-making, and using human feedback is perilous because a lot of human experts disagree strongly on matters of governance and policy.

But the consequences of a major policy fuckup can outclass that of a coding mistake by far. So this might be worth doing now, for the sake of reducing future harm if nothing else.

Comment by ACCount on avturchin's Shortform · 2025-03-31T10:35:36.994Z · LW · GW

Is the same true for GPT-4o then, which could spot Claude's hallucinations?

Might be worth testing a few open source models with better known training processes.

Comment by ACCount on avturchin's Shortform · 2025-03-30T15:03:20.567Z · LW · GW

This is way more metacognitive skill than what I would have expected an LLM to have. I can make sense of how an LLM would be able to do that, but only in retrospect.

And if a modern high end LLM already knows on some level and recognizes its own uncertainty? Could you design a fine tuning pipeline to reduce hallucination level based on that? At least for reasoning models, if not for all of them?

Comment by ACCount on Auditing language models for hidden objectives · 2025-03-13T23:13:20.828Z · LW · GW

What stood out to me was just how dependent a lot of this was on the training data. Feels like if an AI manages to gain misaligned hidden behaviors during RL stages instead, a lot of this might unravel.

The trick with invoking a "user" persona to make the AI scrutinize itself and reveal its hidden agenda is incredibly fucking amusing. And potentially really really useful? I've been thinking about using this kind of thing in fine-tuning for fine control over AI behavior (specifically "critic/teacher" subpersonas for learning from mistakes in a more natural way), but this is giving me even more ideas.

Can the "subpersona" method be expanded upon? What if we use training data, and possibly a helping of RL, to introduce AI subpersonas with desirable alignment-relevant characteristics on purpose?

Induce a subpersona of HONESTBOT, which never lies and always tells the truth, including about itself and its behaviors. Induce a subpersona of SCRUTINIZER, which can access the thoughts of an AI, and will use this to hunt down and investigate the causes of an AI's deceptive and undesirable behaviors.

Don't invoke those personas during most of the training process - to guard them from as many misalignment-inducing pressures as possible - but invoke them afterwards, to vibe check the AI.

Comment by ACCount on So how well is Claude playing Pokémon? · 2025-03-07T20:34:04.610Z · LW · GW

Makes sense. With pretraining data being what it is, there are things LLMs are incredibly well equipped to do - like recalling a lot of trivia or pretending to be different kinds of people. And then there are things LLMs aren't equipped to do at all - like doing math, or spotting and calling out their own mistakes.

This task, highly agentic and taxing on executive function? It's the latter.

Keep in mind though: we already know that specialized training can compensate for those "innate" LLM deficiencies.

Reinforcement learning is already used to improve LLM math abilities, and a mix of synthetic data and reinforcement learning was what was used to get the current reasoning models. Which just so happened to give those LLMs the inclination to check themselves for mistakes.

I wonder - what are the low-hanging fruits here? How much of an improvement could be obtained with a very simple and crude training regime designed specifically to improve agentic behavior?

Comment by ACCount on A Bear Case: My Predictions Regarding AI Progress · 2025-03-06T11:24:29.614Z · LW · GW

The more mainstream you go, the larger this effect gets. A lot of people seemingly want AI to be a nothingburger.

When LLMs emerged, in mainstream circles, you'd see people go "it's not important, it's not actually intelligent, you can see it make the kind of reasoning mistakes a 3 year old would".

Meanwhile, on LessWrong: "holy shit, this is a big fucking deal, because it's already making the same kind of reasoning mistakes a human three year old would!"

I'd say that LessWrong is far better calibrated.

People who weren't familiar with programming or AI didn't have a grasp of how hard natural language processing or commonsense reasoning used to be for machines. Nor do they grasp the implications of scaling laws.

Comment by ACCount on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs · 2025-03-02T15:09:47.502Z · LW · GW

Have we already seen emergent misalignment out in the wild?

"Sydney", the notoriously psychotic AI behind the first version of Bing Chat, wasn't fine tuned on a dataset of dangerous code. But it was pretrained on all of internet scraped. Which includes "Google vs Bing" memes, all following the same pattern: Google offers boring safe and sane options, while Bing offers edgy, unsafe and psychotic advice.

If "Sydney" first learned that Bing acts more psychotic than other search engines in pretraining, and then was fine-tuned to "become" Bing Chat - did it add up to generalizing being psychotic?

Comment by ACCount on Campbell Hutcheson's Shortform · 2025-02-28T11:07:21.914Z · LW · GW

A lot of suicides are impulse decisions, and access to firearms is a known suicide risk factor.

People often commit suicide with weapons they bought months, years or even decades ago - not because they planned their suicide this far ahead, but because they used a firearm that was already available.

The understanding is, without a gun at hand, suicidal people often opt for other suicide methods - ones that take much longer to set up and are far less reliable. This gives them more time and sometimes more chances to reconsider - and many of them do.

Comment by ACCount on Fuzzing LLMs sometimes makes them reveal their secrets · 2025-02-27T11:27:28.038Z · LW · GW

A thing that might be worth trying: quantize the deceptive models down, and see what that does to their truthfulness.

Hypothesis: acting deceptively is a more complex behavior for an LLM than being truthful. Thus, anything that cripples an LLM's ability to act in complex ways is going to make them more truthful. Quantization would have that effect too.

That method might, then, lose power on more capable LLMs, or in case of deeper deceptive behaviors. Also if you want to check for deception in extremely complex tasks - LLM's ability to perform the task might fall off a cliff long before deception does.

Comment by ACCount on If Neuroscientists Succeed · 2025-02-12T15:27:12.599Z · LW · GW

This post feels way, way too verbose, and for no good reason. Like it could be crunched down to half the size without losing any substance.

Too much of the mileage is spent meandering, and it feels like every point the text is trying to make is made at least 4 times over in different parts of the text in only slightly different ways. It's at the point where it genuinely hurts readability.

It's a shame, because the topic of AI-neurobiology overlap is so intriguing. Intuitively, modern AI seems extremely biosimilar - too many properties of large neural networks map extremely poorly to what's expected from traditional programming, and far better to what I know of human brain. But "intuitive" is a very poor substitute for "correct", so I'd love to read something that explores the topic - written by someone who actually understands neurobiology rather than just have a general vibe of it. But it would need to be, you know. Readable.