Posts

The nihilism of NeurIPS 2024-12-20T23:58:11.858Z
Can quantised autoencoders find and interpret circuits in language models? 2024-03-24T20:05:50.125Z

Comments

Comment by charlieoneill (kingchucky211) on The nihilism of NeurIPS · 2024-12-25T21:44:35.813Z · LW · GW

Thank you for laying out a perspective that balances real concerns about misaligned AI with the assurance that our sense of purpose needn’t be at risk. It’s a helpful reminder that human value doesn’t revolve solely around how “useful” we are in a purely economic sense.

If advanced AI really can shoulder the kinds of tasks that drain our energy and attention, we might be able to redirect ourselves toward deeper pursuits—whether that’s creativity, reflection, or genuine care for one another. Of course, this depends on how seriously we approach ethical issues and alignment work; none of these benefits emerge automatically.

I also like your point about how Zen practice emphasises that our humanity isn’t defined by constant production. In a future where machines handle much of what we’ve traditionally laboured over, the task of finding genuine meaning will still be ours.

Comment by charlieoneill (kingchucky211) on The nihilism of NeurIPS · 2024-12-25T21:41:57.147Z · LW · GW

You raise a good point: sometimes relentlessly pursuing a single, rigid “point of it all” can end up more misguided than having no formal point at all. In my more optimistic moments, I see a parallel in how scientific inquiry unfolds.

What keeps me from sliding into pure nihilism is the notion that we can hold meaning lightly but still genuinely. We don’t have to decide on a cosmic teleology to care deeply about each other, or to cherish the possibility of building a better future—especially now, as AI’s acceleration broadens our horizons and our worries. Perhaps the real “point” is to keep exploring, keep caring, and keep staying flexible in how we define what we’re doing here.

Comment by charlieoneill (kingchucky211) on The nihilism of NeurIPS · 2024-12-25T21:40:46.770Z · LW · GW

I really appreciate your perspective on how much of our drive for purpose is bound up in social signalling and the mismatch between our rational minds and the deeper layers of our psyche. It certainly resonates that many of the individuals gathered at NeurIPS (or any elite technical conference) are restless types, perhaps even deliberately so. Still, I find a guarded hope in the very fact that we keep asking these existential questions in the first place—that we haven’t yet fully succumbed to empty routine or robotic pursuit of prestige.

The capacity to reflect on "why we’re doing any of this" might be our uniquely human superpower - even if our attempts at answers are messy or incomplete. As AI becomes more intelligent, I’m cautiously optimistic we might engineer systems that help untangle some of our confusion. If these machines "carry our purposes," as you say, maybe they’ll help us refine those purposes, or at least hold up a mirror we can learn from. After all, intelligence by itself doesn’t have to be sterile or destructive; we have an opportunity to shape it into something that catalyses a more integrated, life-affirming perspective for ourselves.

Comment by charlieoneill (kingchucky211) on Can quantised autoencoders find and interpret circuits in language models? · 2024-04-04T08:45:35.610Z · LW · GW

I agree - you need to actual measure the specificity and sensitivity of your circuit identification. I'm currently doing this with attention heads specifically, rather than just the layers. However, I will object to the notion of "overfitting" because the VQ-VAE is essentially fully unsupervised - it's not really about the DT overfitting because as long as training and eval error are similar then you are simply looking for codes that distinguish positive from negative examples. If iterating over these codes also finds the circuit responsible for the positive examples, then this isn't overfitting but rather a fortunate case of the codes corresponding highly to the actions of the circuit for the task, which is what we want.

I agree that VQ-VAEs are promising, but you can't say they're more scalable than SAE, because SAEs don't have to have 8 times the number of features as the dimension of what they're dictionary learning. In fact, I've found you can set the number of features to be lower than the dimension and it works well for this sort of stuff (which I'll be sharing soon). Many people seem to want to scale the number of features up significantly to achieve "feature splitting", but I actually think for circuit identification it makes more sense to use a smaller number of features, to ensure only general behaviours (for attention heads themselves) are captured.

Thanks for your thoughts, and I look forward to reading your lie detection code!

Comment by charlieoneill (kingchucky211) on Open Thread – Winter 2023/2024 · 2024-03-22T22:01:25.522Z · LW · GW

@Ruby @Raemon @RobertM I've had a post waiting to be approved for almost two weeks now (https://www.lesswrong.com/posts/gSfPk8ZPoHe2PJADv/can-quantised-autoencoders-find-and-interpret-circuits-in, username: charlieoneill). Is this normal? Cheers!