Posts

Joseph Miller's Shortform 2024-05-21T20:50:31.757Z
How To Do Patching Fast 2024-05-11T20:13:52.424Z
Why I'm doing PauseAI 2024-04-30T16:21:54.156Z
Global Pause AI Protest 10/21 2023-10-14T03:20:27.937Z
The International PauseAI Protest: Activism under uncertainty 2023-10-12T17:36:15.716Z
Even Superhuman Go AIs Have Surprising Failure Modes 2023-07-20T17:31:35.814Z
We Found An Neuron in GPT-2 2023-02-11T18:27:29.410Z

Comments

Comment by Joseph Miller (Josephm) on What would stop you from paying for an LLM? · 2024-05-23T02:57:22.153Z · LW · GW

Unfortunately the sharing function is broken for me.

Comment by Joseph Miller (Josephm) on What would stop you from paying for an LLM? · 2024-05-22T20:14:51.162Z · LW · GW

I am confused by takes like this - it just seems so blatantly wrong to me.

For example, yesterday I showed GPT-4o this image.

I asked it to show why (10) is the solution to (9). It wrote out the derivation in perfect Latex.

I guess this is in some sense a "trivial" problem, but I couldn't immediately think of the solution. It is googleable, but only indirectly, because you have to translate the problem to a more general form first. So I think for you to claim that LLMs are not useful you have to have incredibly high standards for what problems are easy / googleable and not value the convenience of just asking the exact question with the opportunity to ask followups.

Comment by Joseph Miller (Josephm) on Joseph Miller's Shortform · 2024-05-21T20:50:31.957Z · LW · GW

BBC Tech News as far as I can tell has not covered any of the recent OpenAI drama about NDAs or employees leaving.

But Scarlett Johansson 'shocked' by AI chatbot imitation is now the main headline.

Comment by Joseph Miller (Josephm) on Advice for Activists from the History of Environmentalism · 2024-05-17T00:12:46.747Z · LW · GW

Thanks, this is really useful.

I am of the opinion that you should use good epistemics when talking to the public or policy makers, rather than using bad epistemics to try to be more persuasive.

Do you have any particular examples as evidence of this? This is something I've been thinking a lot about for AI and I'm quite uncertain. It seems that ~0% of advocacy campaigns have good epistemics, so it's hard to have evidence about this. Emotional appeals are important and often hard to reconcile with intellectual honesty.

Of course there are different standards for good epistemics and it's probably bad to outright lie, or be highly misleading. But by EA standards of "good epistemics" it seems less clear if the benefits are worth the costs.

As one example, the AI Safety movement may want to partner with advocacy groups who care about AI using copyrighted data or unions concerned about jobs. But these groups basically always have terrible epistemics and partnering usually requires some level of endorsement of their positions.

As an even more extreme example, as far as I can tell about 99.9% of people have terrible epistemics by LessWrong standards so to even expand to a decently sized movement you will have to fill the ranks with people who will constantly say and think things that you think are wrong.

Comment by Joseph Miller (Josephm) on How To Do Patching Fast · 2024-05-14T17:33:39.079Z · LW · GW

I'm not sure if this is intentional but this explanation implies that edge patching can only be done between nodes in adjacent layers, which is not the case.

Comment by Joseph Miller (Josephm) on How To Do Patching Fast · 2024-05-14T17:31:40.065Z · LW · GW

Yes you're correct that it does not work with LayerNorm between layers. I'm not aware of any models that do this. Are you?

Comment by Joseph Miller (Josephm) on How To Do Patching Fast · 2024-05-14T17:30:31.073Z · LW · GW

Did you try how this works in practice? I could imagine an SGD-based circuit finder could be pretty efficient (compared to brute-force algorithms like ACDC), I'd love to see that comparison some day!

Yes it does work well! I did a kind of write up here but decided not to publish for various reasons.

Do you have a link to a writeup of Li et al. (2023) beyond the git repo?

https://arxiv.org/abs/2309.05973

Comment by Joseph Miller (Josephm) on Rejecting Television · 2024-05-06T12:24:36.981Z · LW · GW

I quit YouTube a few years ago and it was probably the single best decision I've ever made.

However I also found that I naturally substitute it with something else. For example, I subsequently became addictived to Reddit. I quit Reddit and substituted for Hackernews and LessWrong. When I quit those I substituted for checking Slack, Email and Discord.

Thankfully being addicted to Slack does seem to be substantially less harmful than YouTube.

I've found the app OneSec very useful for reducing addictions. It's an app blocker that doesn't actually block, it just delays you opening the page, so you're much less likely to delete it in a moment of weakness.

Comment by Joseph Miller (Josephm) on Why I'm doing PauseAI · 2024-05-04T18:51:41.117Z · LW · GW

Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights?

I was thinking of a scenario where OpenAI deliberately gives it access to its own weights to see if it can self improve.

I agree that it would be more likely to just speed up normal ML research.

Comment by Joseph Miller (Josephm) on Why I'm doing PauseAI · 2024-05-03T19:22:06.726Z · LW · GW

While I want people to support PauseAI

the small movement that PauseAI builds now will be the foundation which bootstraps this larger movement in the future

Is one of the main points of my post. If you support PauseAI today you may unleash a force which you cannot control tomorrow.

Comment by Joseph Miller (Josephm) on Thoughts on seed oil · 2024-04-23T21:41:31.641Z · LW · GW

If you want to be healthier, we know ways you can change your diet that will help: Increase your overall diet “quality”. Eat lots of fruits and vegetables. Avoid processed food. Especially avoid processed meats. Eat food with low caloric density. Avoid added sugar. Avoid alcohol. Avoid processed food.

I'm confused - why are you so confident that we should avoid processed food. Isn't the whole point of your post that we don't know whether processed oil is bad for you? Where's the overwhelming evidence that processed food in general is bad?

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-13T19:30:13.583Z · LW · GW

Reconstruction loss is the CE loss of the patched model

If this is accurate then I agree that this is not the same as "the KL Divergence between the normal model and the model when you patch in the reconstructed activations". But Fengyuan described reconstruction score as: 

measures how replacing activations changes the total loss of the model

which I still claim is equivalent.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-10T19:14:00.832Z · LW · GW

I think just showing  would be better than reconstruction score metric because  is very noisy.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-08T22:56:35.343Z · LW · GW

there is a validation metric called reconstruction score that measures how replacing activations change the total loss of the model

That's equivalent to the KL metric. Would be good to include as I think it's the most important metric of performance.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-08T15:00:04.834Z · LW · GW

Patch loss is different to L2. It's the KL Divergence between the normal model and the model when you patch in the reconstructed activations at some layer.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-08T13:42:27.249Z · LW · GW

It would be good to benchmark the normalized and baseline SAEs using the standard metrics of patch loss and L0.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-08T13:40:46.297Z · LW · GW

What is Neuron Activity?

Comment by Joseph Miller (Josephm) on Gradient Descent on the Human Brain · 2024-04-02T13:00:03.614Z · LW · GW

Does anything think this could actually be done in <20 years?

Comment by Joseph Miller (Josephm) on AE Studio @ SXSW: We need more AI consciousness research (and further resources) · 2024-03-27T22:11:24.540Z · LW · GW

materialism where we haven't discovered all the laws of physics yet — specifically, those that constitute the sought-for materialist explanation of consciousness

It seems unlikely that new laws of physics are required to understand consciousness? My claim is that understanding consciousness just requires us to understand the algorithms in the brain.

Without that real explanation, “atoms!” or “materialism!”, is just a label plastered over our ignorance.

Agreed. I don't think this contradicts what I wrote (not sure if that was the implication).

Comment by Joseph Miller (Josephm) on AE Studio @ SXSW: We need more AI consciousness research (and further resources) · 2024-03-26T22:21:31.060Z · LW · GW

The Type II error of behaving as if these and future systems are not conscious in a world where they are in fact conscious.

Consciousness does not have a commonly agreed upon definition. The question of whether an AI is conscious cannot be answered until you choose a precise definition of consciousness, at which point the question falls out of the realm of philosophy into standard science.

This might seem like mere pedantry or missing the point, because the whole challenge is to figure out the definition of consciousness, but I think it is actually the central issue. People are grasping for some solution to the "hard problem" of capturing the je ne sais quoi of what it is like to be a thing, but they will not succeed until they deconfuse themselves about the intangible nature of sentience.

You cannot know about something unless it is somehow connected the causal chain that led to the current state of your brain. If we know about a thing called "consciousness" then it is part of this causal chain. Therefore "consciousness", whatever it is, is a part of physics. There is no evidence for, and there cannot ever be evidence for, any kind of dualism or epiphenomenal consciousness. This leaves us to conclude that either panpsychism or materialism is correct. And causally-connected panpsychism is just materialism where we haven't discovered all the laws of physics yet. This is basically the argument for illusionism.

So "consciousness" is the algorithm that causes brains to say "I think therefore I am". Is there some secret sauce that makes this algorithm special and different from all currently known algorithms, such that if we understood it we would suddenly feel enlightened? I doubt it. I expect we will just find a big pile of heuristics and optimization procedures that are fundamentally familiar to computer science. Maybe you disagree, that's fine! But let's just be clear that that is what we're looking for, not some other magisterium.

If consciousness is indeed sufficient for moral patienthood, then the stakes seem remarkably high from a utilitarian perspective.

Agreed. If your utility function is that you like computations similar to the human experience of pleasure and you dislike computations similar to the human experience of pain (mine is!). But again, let's not confuse ourselves by thinking there's some deep secret about the nature of reality to uncover. Your choice of meta-ethical system is of the same type signature as your choice of favorite food.

Comment by Joseph Miller (Josephm) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-18T20:28:22.372Z · LW · GW

The subfaction veto only applies to faction level policy. The faction veto is decided by pure democracy within the faction.

I would guess in most scenarios most subfactions would agree when to use the faction veto. Eg. all the Southern states didn't want to end slavery.

Comment by Joseph Miller (Josephm) on D0TheMath's Shortform · 2024-03-17T23:05:59.428Z · LW · GW

Yes, Garrett is referring to this post: https://www.lesswrong.com/posts/yi7shfo6YfhDEYizA/more-people-getting-into-ai-safety-should-do-a-phd

Comment by Joseph Miller (Josephm) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-17T23:02:59.288Z · LW · GW

Presumably the factions (eg. Southern states) also have sub factions, so maybe a better system would be described with the recursive acronym DVDF:
Democracy with Veto for each faction, plus DVDF within Factions.

Comment by Joseph Miller (Josephm) on Victor Ashioya's Shortform · 2024-03-15T13:24:22.765Z · LW · GW

What's up with Tim Cook not using buzzwords like AI and ML? There is definitely something cool and aloof about refusing to get sucked into the latest hype train and I guess Apple are the masters of branding.

Comment by Joseph Miller (Josephm) on AI #54: Clauding Along · 2024-03-11T14:11:32.941Z · LW · GW

Anthropic now has a highly impressive model, impressive enough that it seems as if it breaks at least the spirit of their past commitments on how far they will push the frontier.

Why do you not discuss this further? I want to hear your thoughts.

Comment by Joseph Miller (Josephm) on Vote on Anthropic Topics to Discuss · 2024-03-07T00:41:30.976Z · LW · GW

Would be nice to create prediction markets for some of these. Especially interested in the ones about pausing development.

Comment by Joseph Miller (Josephm) on The Parable Of The Fallen Pendulum - Part 1 · 2024-03-01T17:23:18.128Z · LW · GW

The students are correct to take this as evidence against the theory. However they can go back to the whiteboard, gain a full understanding of the theory, correct their experiment and subsequently collect overwhelming evidence to overcome their current distrust of the theory.

Comment by Joseph Miller (Josephm) on Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds" · 2024-03-01T13:56:47.002Z · LW · GW

Yup I think I agree. However I could see this going wrong in some kind of slow takeoff world where the AI is already in charge of many things in the world.

Comment by Joseph Miller (Josephm) on Adding Sensors to Mandolin? · 2024-03-01T03:46:02.579Z · LW · GW

That video is pretty awesome. Would be great if you could make it a 4 part band by singing at the same time.

Comment by Joseph Miller (Josephm) on Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds" · 2024-02-29T22:13:01.620Z · LW · GW

If an AI somehow (implicitly, in practice) kept track of all the plausible H’s, i.e., those with high probability under P(H | D), then there would be a perfectly safe way to act: if any of the plausible hypotheses predicted that some action caused a major harm (like the death of humans), then the AI should not choose that action. Indeed, if the correct hypothesis H* predicts harm, it means that some plausible H predicts harm. Showing that no such H exists therefore rules out the possibility that this action yields harm, and the AI can safely execute it.

This idea seems to ignore the problem that the null action can also entail harm. In a trolley problem this AI would never be able to pull the lever.

Maybe you could get around this by saying that it compares the entire wellbeing of the world with and without its intervention. But still in that case if it had any uncertainty as to which way had the most harm, it would be systematically biased toward inaction, even when the expected harm was clearly less if it took action.

Comment by Joseph Miller (Josephm) on Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds" · 2024-02-29T21:56:46.514Z · LW · GW

To avoid catastrophic errors, now consider a risk management approach, with an AI that represents not a single H but a large set of them, in the form of a generative distribution over hypotheses H

On first reading this post, the whole proposal seemed so abstract that I wouldn't know how to even begin making such an AI. However after a very quick skim of some of Bengio's recent papers I think I have more of a sense for what he has in mind.

I think his approach is roughly to create a generative model that constructs Bayesian Networks edge by edge, where the likelihood of generating any given network represents the likelihood that that causal model is the correct hypothesis.

And he's using GFlowNets to do it, which are a new type of ML/RL model developed by MILA that generate objects with likelihood proportional to some reward function (unlike normal RL which always tries to achieve maximum reward). They seem to have mostly been used for biological problems so far.

Comment by Joseph Miller (Josephm) on Speaking to Congressional staffers about AI risk · 2024-02-29T16:39:17.212Z · LW · GW

> I think I would've written up a doc that explained my reasoning, documented the people I consulted with, documented the upside and downside risks I was aware of, and sent it out to some EAs.

internally screaming

Can you please explain what this means?

Comment by Joseph Miller (Josephm) on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders · 2024-02-28T20:09:44.944Z · LW · GW

I think it is the same. When training next-token predictors we model the ground truth probability distribution as having probability  for the actual next token and  for all other tokens in the vocab. This is how the cross-entropy loss simplifies to negative log likelihood. You can see that the transformer lens implementation doesn't match the equation for cross entropy loss because it is using this simplification.

So the missing factor of  would just be  I think.

Comment by Joseph Miller (Josephm) on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders · 2024-02-28T13:12:36.204Z · LW · GW

Has anyone tried training an SAE using the performance of the patched model as the loss function? I guess this would be a lot more expensive, but given that is the metric we actually care about, it seems sensible to optimise for it directly.

Comment by Joseph Miller (Josephm) on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders · 2024-02-28T13:08:53.683Z · LW · GW

Am I right in thinking that your ΔCE metric is equivalent to the KL Divergence between the SAE patched model and the normal model?

Comment by Joseph Miller (Josephm) on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders · 2024-02-28T13:05:47.926Z · LW · GW

Could the problem with long context lengths simply be that the positional embedding cannot be represented by the SAE? You could test this by manually adding the positional embedding vector when you patch in the SAE reconstruction.

Comment by Joseph Miller (Josephm) on China-AI forecasts · 2024-02-26T03:11:43.708Z · LW · GW

The way that billionaires make the their money in the US is typically by creating new companies.

Are you sure? Lars Doucet disagrees:

One of the big misapprehensions people have is that, when they think of billionaires, they think of people like Bill Gates and Elon Musk and Jeff Bezos. Those are actually the minority billionaires, most billionaires are people involved in hedge funds, they are bankers. And what are two thirds of banks? It's real estate. 

Comment by Joseph Miller (Josephm) on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T19:44:55.754Z · LW · GW

FYI I think this post is getting few upvotes because it doesn't contribute anything new to the alignment discussion.  This point has already been written about many times before.

As long as we have it under some level of control, we can just say, “Hey, act ethically, OK?,”

Yes but the whole alignment problem is to get an ASI under some level of control.

Comment by Joseph Miller (Josephm) on The Pointer Resolution Problem · 2024-02-18T17:27:14.432Z · LW · GW

One option is to spend a lot of time explaining why “soul” isn’t actually the thing in the territory they care about, and talk about moral patienthood and theories of welfare and moral status.

In my opinion, the concepts of moral patienthood and theories of moral status are about as confused as the idea of souls.

Comment by Joseph Miller (Josephm) on Value learning in the absence of ground truth · 2024-02-06T14:45:28.279Z · LW · GW

My understanding of the core alignment problem is giving an AGI any goal at all (hence the diamond-alignment problem). A superintelligent AGI will know better than we do what we desire, so if we simply had the ability to give the AI instructions in natural language and have it execute them to the best of its ability, we would not have to figure out the correct human values.

Comment by Joseph Miller (Josephm) on Implementing activation steering · 2024-02-05T21:54:11.552Z · LW · GW

One pro of wrapper functions is that you can find the gradient of the steering vector.

Comment by Joseph Miller (Josephm) on Implementing activation steering · 2024-02-05T21:51:57.529Z · LW · GW

From the title I thought this post was going to be different techniques for finding steering vectors (eg. mean-centered, crafting prompts, etc.) which I think would also be very useful.

Comment by Joseph Miller (Josephm) on Fact Finding: Simplifying the Circuit (Post 2) · 2024-01-18T14:25:32.146Z · LW · GW

Ok thanks!

Comment by Joseph Miller (Josephm) on Fact Finding: Simplifying the Circuit (Post 2) · 2024-01-17T00:38:20.672Z · LW · GW

What's up with the <pad> token (<pad>==<bos>==<eos> in Pythia) in the attention diagram? I assume that doesn't need to be there?

Comment by Joseph Miller (Josephm) on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-12T22:18:44.296Z · LW · GW

This is a great post.

Comment by Joseph Miller (Josephm) on Mapping the semantic void: Strange goings-on in GPT embedding spaces · 2023-12-20T19:12:22.580Z · LW · GW

Any time the embeddings / residual stream vectors is used for anything, they are projected onto the surface of a  dimensional hypersphere. This changes the geometry.

Comment by Joseph Miller (Josephm) on Mapping the semantic void: Strange goings-on in GPT embedding spaces · 2023-12-16T13:06:29.509Z · LW · GW

I haven't read this properly but my guess is that this whole analysis is importantly wrong to some extent because you haven't considered layernorm. It only makes sense to interpret embeddings in the layernorm space.

Edit: I have now read most of this and I don't think anything you say is wrong exactly, but I do think layernorm is playing a cruitial role that you should not be ignoring.

But the post is still super interesting!

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:53:24.505Z · LW · GW

I actually upvoted

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:53:05.206Z · LW · GW

Writing style / tone

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:50:47.336Z · LW · GW

Other