Posts

Global Pause AI Protest 10/21 2023-10-14T03:20:27.937Z
The International PauseAI Protest: Activism under uncertainty 2023-10-12T17:36:15.716Z
Even Superhuman Go AIs Have Surprising Failure Modes 2023-07-20T17:31:35.814Z
We Found An Neuron in GPT-2 2023-02-11T18:27:29.410Z

Comments

Comment by Joseph Miller (Josephm) on Thoughts on seed oil · 2024-04-23T21:41:31.641Z · LW · GW

If you want to be healthier, we know ways you can change your diet that will help: Increase your overall diet “quality”. Eat lots of fruits and vegetables. Avoid processed food. Especially avoid processed meats. Eat food with low caloric density. Avoid added sugar. Avoid alcohol. Avoid processed food.

I'm confused - why are you so confident that we should avoid processed food. Isn't the whole point of your post that we don't know whether processed oil is bad for you? Where's the overwhelming evidence that processed food in general is bad?

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-13T19:30:13.583Z · LW · GW

Reconstruction loss is the CE loss of the patched model

If this is accurate then I agree that this is not the same as "the KL Divergence between the normal model and the model when you patch in the reconstructed activations". But Fengyuan described reconstruction score as: 

measures how replacing activations changes the total loss of the model

which I still claim is equivalent.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-10T19:14:00.832Z · LW · GW

I think just showing  would be better than reconstruction score metric because  is very noisy.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-08T22:56:35.343Z · LW · GW

there is a validation metric called reconstruction score that measures how replacing activations change the total loss of the model

That's equivalent to the KL metric. Would be good to include as I think it's the most important metric of performance.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-08T15:00:04.834Z · LW · GW

Patch loss is different to L2. It's the KL Divergence between the normal model and the model when you patch in the reconstructed activations at some layer.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-08T13:42:27.249Z · LW · GW

It would be good to benchmark the normalized and baseline SAEs using the standard metrics of patch loss and L0.

Comment by Joseph Miller (Josephm) on Normalizing Sparse Autoencoders · 2024-04-08T13:40:46.297Z · LW · GW

What is Neuron Activity?

Comment by Joseph Miller (Josephm) on Gradient Descent on the Human Brain · 2024-04-02T13:00:03.614Z · LW · GW

Does anything think this could actually be done in <20 years?

Comment by Joseph Miller (Josephm) on AE Studio @ SXSW: We need more AI consciousness research (and further resources) · 2024-03-27T22:11:24.540Z · LW · GW

materialism where we haven't discovered all the laws of physics yet — specifically, those that constitute the sought-for materialist explanation of consciousness

It seems unlikely that new laws of physics are required to understand consciousness? My claim is that understanding consciousness just requires us to understand the algorithms in the brain.

Without that real explanation, “atoms!” or “materialism!”, is just a label plastered over our ignorance.

Agreed. I don't think this contradicts what I wrote (not sure if that was the implication).

Comment by Joseph Miller (Josephm) on AE Studio @ SXSW: We need more AI consciousness research (and further resources) · 2024-03-26T22:21:31.060Z · LW · GW

The Type II error of behaving as if these and future systems are not conscious in a world where they are in fact conscious.

Consciousness does not have a commonly agreed upon definition. The question of whether an AI is conscious cannot be answered until you choose a precise definition of consciousness, at which point the question falls out of the realm of philosophy into standard science.

This might seem like mere pedantry or missing the point, because the whole challenge is to figure out the definition of consciousness, but I think it is actually the central issue. People are grasping for some solution to the "hard problem" of capturing the je ne sais quoi of what it is like to be a thing, but they will not succeed until they deconfuse themselves about the intangible nature of sentience.

You cannot know about something unless it is somehow connected the causal chain that led to the current state of your brain. If we know about a thing called "consciousness" then it is part of this causal chain. Therefore "consciousness", whatever it is, is a part of physics. There is no evidence for, and there cannot ever be evidence for, any kind of dualism or epiphenomenal consciousness. This leaves us to conclude that either panpsychism or materialism is correct. And causally-connected panpsychism is just materialism where we haven't discovered all the laws of physics yet. This is basically the argument for illusionism.

So "consciousness" is the algorithm that causes brains to say "I think therefore I am". Is there some secret sauce that makes this algorithm special and different from all currently known algorithms, such that if we understood it we would suddenly feel enlightened? I doubt it. I expect we will just find a big pile of heuristics and optimization procedures that are fundamentally familiar to computer science. Maybe you disagree, that's fine! But let's just be clear that that is what we're looking for, not some other magisterium.

If consciousness is indeed sufficient for moral patienthood, then the stakes seem remarkably high from a utilitarian perspective.

Agreed. If your utility function is that you like computations similar to the human experience of pleasure and you dislike computations similar to the human experience of pain (mine is!). But again, let's not confuse ourselves by thinking there's some deep secret about the nature of reality to uncover. Your choice of meta-ethical system is of the same type signature as your choice of favorite food.

Comment by Joseph Miller (Josephm) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-18T20:28:22.372Z · LW · GW

The subfaction veto only applies to faction level policy. The faction veto is decided by pure democracy within the faction.

I would guess in most scenarios most subfactions would agree when to use the faction veto. Eg. all the Southern states didn't want to end slavery.

Comment by Joseph Miller (Josephm) on D0TheMath's Shortform · 2024-03-17T23:05:59.428Z · LW · GW

Yes, Garrett is referring to this post: https://www.lesswrong.com/posts/yi7shfo6YfhDEYizA/more-people-getting-into-ai-safety-should-do-a-phd

Comment by Joseph Miller (Josephm) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-17T23:02:59.288Z · LW · GW

Presumably the factions (eg. Southern states) also have sub factions, so maybe a better system would be described with the recursive acronym DVDF:
Democracy with Veto for each faction, plus DVDF within Factions.

Comment by Joseph Miller (Josephm) on Victor Ashioya's Shortform · 2024-03-15T13:24:22.765Z · LW · GW

What's up with Tim Cook not using buzzwords like AI and ML? There is definitely something cool and aloof about refusing to get sucked into the latest hype train and I guess Apple are the masters of branding.

Comment by Joseph Miller (Josephm) on AI #54: Clauding Along · 2024-03-11T14:11:32.941Z · LW · GW

Anthropic now has a highly impressive model, impressive enough that it seems as if it breaks at least the spirit of their past commitments on how far they will push the frontier.

Why do you not discuss this further? I want to hear your thoughts.

Comment by Joseph Miller (Josephm) on Vote on Anthropic Topics to Discuss · 2024-03-07T00:41:30.976Z · LW · GW

Would be nice to create prediction markets for some of these. Especially interested in the ones about pausing development.

Comment by Joseph Miller (Josephm) on The Parable Of The Fallen Pendulum - Part 1 · 2024-03-01T17:23:18.128Z · LW · GW

The students are correct to take this as evidence against the theory. However they can go back to the whiteboard, gain a full understanding of the theory, correct their experiment and subsequently collect overwhelming evidence to overcome their current distrust of the theory.

Comment by Joseph Miller (Josephm) on Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds" · 2024-03-01T13:56:47.002Z · LW · GW

Yup I think I agree. However I could see this going wrong in some kind of slow takeoff world where the AI is already in charge of many things in the world.

Comment by Joseph Miller (Josephm) on Adding Sensors to Mandolin? · 2024-03-01T03:46:02.579Z · LW · GW

That video is pretty awesome. Would be great if you could make it a 4 part band by singing at the same time.

Comment by Joseph Miller (Josephm) on Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds" · 2024-02-29T22:13:01.620Z · LW · GW

If an AI somehow (implicitly, in practice) kept track of all the plausible H’s, i.e., those with high probability under P(H | D), then there would be a perfectly safe way to act: if any of the plausible hypotheses predicted that some action caused a major harm (like the death of humans), then the AI should not choose that action. Indeed, if the correct hypothesis H* predicts harm, it means that some plausible H predicts harm. Showing that no such H exists therefore rules out the possibility that this action yields harm, and the AI can safely execute it.

This idea seems to ignore the problem that the null action can also entail harm. In a trolley problem this AI would never be able to pull the lever.

Maybe you could get around this by saying that it compares the entire wellbeing of the world with and without its intervention. But still in that case if it had any uncertainty as to which way had the most harm, it would be systematically biased toward inaction, even when the expected harm was clearly less if it took action.

Comment by Joseph Miller (Josephm) on Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds" · 2024-02-29T21:56:46.514Z · LW · GW

To avoid catastrophic errors, now consider a risk management approach, with an AI that represents not a single H but a large set of them, in the form of a generative distribution over hypotheses H

On first reading this post, the whole proposal seemed so abstract that I wouldn't know how to even begin making such an AI. However after a very quick skim of some of Bengio's recent papers I think I have more of a sense for what he has in mind.

I think his approach is roughly to create a generative model that constructs Bayesian Networks edge by edge, where the likelihood of generating any given network represents the likelihood that that causal model is the correct hypothesis.

And he's using GFlowNets to do it, which are a new type of ML/RL model developed by MILA that generate objects with likelihood proportional to some reward function (unlike normal RL which always tries to achieve maximum reward). They seem to have mostly been used for biological problems so far.

Comment by Joseph Miller (Josephm) on Speaking to Congressional staffers about AI risk · 2024-02-29T16:39:17.212Z · LW · GW

> I think I would've written up a doc that explained my reasoning, documented the people I consulted with, documented the upside and downside risks I was aware of, and sent it out to some EAs.

internally screaming

Can you please explain what this means?

Comment by Joseph Miller (Josephm) on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders · 2024-02-28T20:09:44.944Z · LW · GW

I think it is the same. When training next-token predictors we model the ground truth probability distribution as having probability  for the actual next token and  for all other tokens in the vocab. This is how the cross-entropy loss simplifies to negative log likelihood. You can see that the transformer lens implementation doesn't match the equation for cross entropy loss because it is using this simplification.

So the missing factor of  would just be  I think.

Comment by Joseph Miller (Josephm) on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders · 2024-02-28T13:12:36.204Z · LW · GW

Has anyone tried training an SAE using the performance of the patched model as the loss function? I guess this would be a lot more expensive, but given that is the metric we actually care about, it seems sensible to optimise for it directly.

Comment by Joseph Miller (Josephm) on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders · 2024-02-28T13:08:53.683Z · LW · GW

Am I right in thinking that your ΔCE metric is equivalent to the KL Divergence between the SAE patched model and the normal model?

Comment by Joseph Miller (Josephm) on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders · 2024-02-28T13:05:47.926Z · LW · GW

Could the problem with long context lengths simply be that the positional embedding cannot be represented by the SAE? You could test this by manually adding the positional embedding vector when you patch in the SAE reconstruction.

Comment by Joseph Miller (Josephm) on China-AI forecasts · 2024-02-26T03:11:43.708Z · LW · GW

The way that billionaires make the their money in the US is typically by creating new companies.

Are you sure? Lars Doucet disagrees:

One of the big misapprehensions people have is that, when they think of billionaires, they think of people like Bill Gates and Elon Musk and Jeff Bezos. Those are actually the minority billionaires, most billionaires are people involved in hedge funds, they are bankers. And what are two thirds of banks? It's real estate. 

Comment by Joseph Miller (Josephm) on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T19:44:55.754Z · LW · GW

FYI I think this post is getting few upvotes because it doesn't contribute anything new to the alignment discussion.  This point has already been written about many times before.

As long as we have it under some level of control, we can just say, “Hey, act ethically, OK?,”

Yes but the whole alignment problem is to get an ASI under some level of control.

Comment by Joseph Miller (Josephm) on The Pointer Resolution Problem · 2024-02-18T17:27:14.432Z · LW · GW

One option is to spend a lot of time explaining why “soul” isn’t actually the thing in the territory they care about, and talk about moral patienthood and theories of welfare and moral status.

In my opinion, the concepts of moral patienthood and theories of moral status are about as confused as the idea of souls.

Comment by Joseph Miller (Josephm) on Value learning in the absence of ground truth · 2024-02-06T14:45:28.279Z · LW · GW

My understanding of the core alignment problem is giving an AGI any goal at all (hence the diamond-alignment problem). A superintelligent AGI will know better than we do what we desire, so if we simply had the ability to give the AI instructions in natural language and have it execute them to the best of its ability, we would not have to figure out the correct human values.

Comment by Joseph Miller (Josephm) on Implementing activation steering · 2024-02-05T21:54:11.552Z · LW · GW

One pro of wrapper functions is that you can find the gradient of the steering vector.

Comment by Joseph Miller (Josephm) on Implementing activation steering · 2024-02-05T21:51:57.529Z · LW · GW

From the title I thought this post was going to be different techniques for finding steering vectors (eg. mean-centered, crafting prompts, etc.) which I think would also be very useful.

Comment by Joseph Miller (Josephm) on Fact Finding: Simplifying the Circuit (Post 2) · 2024-01-18T14:25:32.146Z · LW · GW

Ok thanks!

Comment by Joseph Miller (Josephm) on Fact Finding: Simplifying the Circuit (Post 2) · 2024-01-17T00:38:20.672Z · LW · GW

What's up with the <pad> token (<pad>==<bos>==<eos> in Pythia) in the attention diagram? I assume that doesn't need to be there?

Comment by Joseph Miller (Josephm) on An Actually Intuitive Explanation of the Oberth Effect · 2024-01-12T22:18:44.296Z · LW · GW

This is a great post.

Comment by Joseph Miller (Josephm) on Mapping the semantic void: Strange goings-on in GPT embedding spaces · 2023-12-20T19:12:22.580Z · LW · GW

Any time the embeddings / residual stream vectors is used for anything, they are projected onto the surface of a  dimensional hypersphere. This changes the geometry.

Comment by Joseph Miller (Josephm) on Mapping the semantic void: Strange goings-on in GPT embedding spaces · 2023-12-16T13:06:29.509Z · LW · GW

I haven't read this properly but my guess is that this whole analysis is importantly wrong to some extent because you haven't considered layernorm. It only makes sense to interpret embeddings in the layernorm space.

Edit: I have now read most of this and I don't think anything you say is wrong exactly, but I do think layernorm is playing a cruitial role that you should not be ignoring.

But the post is still super interesting!

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:53:24.505Z · LW · GW

I actually upvoted

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:53:05.206Z · LW · GW

Writing style / tone

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:50:47.336Z · LW · GW

Other

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:50:36.782Z · LW · GW

Post is boring / obvious / doesn't have new ideas

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:49:34.575Z · LW · GW

Post is wrong / misleading

Comment by Joseph Miller (Josephm) on Focus on existential risk is a distraction from the real issues. A false fallacy · 2023-10-31T10:49:15.625Z · LW · GW

I'm interested why this is down voted so much. Upvote the child comment that best matches your reason for down voting.

Comment by Joseph Miller (Josephm) on EPUBs of MIRI Blog Archives and selected LW Sequences · 2023-10-27T02:59:51.549Z · LW · GW

Has anyone made a good EPUB of Planecrash / Mad Investor Chaos yet?

Comment by Joseph Miller (Josephm) on Sam Altman's sister, Annie Altman, claims Sam has severely abused her · 2023-10-07T21:38:01.228Z · LW · GW

Can anyone comment on the likelihood of her forgetting the abuse she experienced as a 4 year old and then remembering it at ~26 years old? Given the other circumstances this seems quite likely to be a false memory, but I am not familiar with the research on this topic.

Comment by Joseph Miller (Josephm) on Towards Monosemanticity: Decomposing Language Models With Dictionary Learning · 2023-10-07T00:45:58.711Z · LW · GW

Looking at their interactive visualization, I was surprised how clean random learned features are.

Comment by Joseph Miller (Josephm) on PSA: The community is in Berkeley/Oakland, not "the Bay Area" · 2023-10-04T23:36:13.191Z · LW · GW

More than London?

Comment by Joseph Miller (Josephm) on AI pause/governance advocacy might be net-negative, especially without focus on explaining the x-risk · 2023-08-29T13:45:26.660Z · LW · GW

This seems to me an instantiation of a classic debate about realpolitik.

I disagree with the main point in this post because raising concerns over x-risk is not mutually exclusive with advocating for more palatable policies (such as requiring evals before deployment). I think the actual thing that many EAs are trying to do is to talk loudly about near term policies while also mentioning x-risk concerns to the extent that they think is currently politically useful. The aim of this is to slow down AI progress (giving us more time to find a permanent solution), gain traction within the political system and actually make AI safer (although if alignment is hard then these policies may not actually reduce x-risk directly).

Gaining knowledge, experience and contacts in AI policy making will make it easier to advocate policies that actually deal with x-risk in the future. The concern about being seen as dishonest for not raising x-risk sooner feels unrealistic to me because it is so standard in public discourse to say something not because you believe it but because it aligns with your tribe (ie. operate at higher Simulacrum Levels).

In summary

Implement as much AI regulation as you can today, while gaining influence and gradually raising the salience of x-risk so that you can implement better regulation in the future.

seems like a reasonable strategy and better than the proposed alternative of

Only communicate x-risk concerns to policy makers.

Comment by Joseph Miller (Josephm) on Your posts should be on arXiv · 2023-04-16T01:40:25.781Z · LW · GW

Any update on when this might happen?

Comment by Joseph Miller (Josephm) on GPT-4: What we (I) know about it · 2023-03-15T21:08:28.062Z · LW · GW

In a transformer, the compute cost for context length n grows at O(n^2)[4], so it's a 16x increase in compute cost to go from 2000 tokens to 8000, and another 16x increase to go to 32000. To the best of my knowledge, there isn't much additional cost to a longer context window - the number of parameters to encode more positions is very small for a model this big.

I do not understand this paragraph, it seems like the first sentence contradicts the second.

Edit: I think I understand. Are you saying there isn't much additional cost on top of the cost mentioned in the previous sentence because the position encoding is tiny compared to everything else in the model?