Posts

What is the alpha in one bit of evidence? 2024-10-22T21:57:09.056Z
The Best Lay Argument is not a Simple English Yud Essay 2024-09-10T17:34:28.422Z
Attention-Feature Tables in Gemma 2 Residual Streams 2024-08-06T22:56:40.828Z
Risk Overview of AI in Bio Research 2024-07-15T00:04:41.818Z
How to Better Report Sparse Autoencoder Performance 2024-06-02T19:34:22.803Z
To Limit Impact, Limit KL-Divergence 2024-05-18T18:52:39.081Z
Introducing Statistical Utility Mechanics: A Framework for Utility Maximizers 2024-05-15T21:56:48.950Z
Taming Infinity (Stat Mech Part 3) 2024-05-15T21:43:03.406Z
Conserved Quantities (Stat Mech Part 2) 2024-05-04T13:40:55.825Z
So What's Up With PUFAs Chemically? 2024-04-27T13:32:52.159Z
Forget Everything (Statistical Mechanics Part 1) 2024-04-22T13:33:35.446Z
Measuring Learned Optimization in Small Transformer Models 2024-04-08T14:41:27.669Z
Briefly Extending Differential Optimization to Distributions 2024-03-10T20:41:09.551Z
Finite Factored Sets to Bayes Nets Part 2 2024-02-03T12:25:41.444Z
From Finite Factors to Bayes Nets 2024-01-23T20:03:51.845Z
Differential Optimization Reframes and Generalizes Utility-Maximization 2023-12-27T01:54:22.731Z
Mathematically-Defined Optimization Captures A Lot of Useful Information 2023-10-29T17:17:03.211Z
Defining Optimization in a Deeper Way Part 4 2022-07-28T17:02:33.411Z
Defining Optimization in a Deeper Way Part 3 2022-07-20T22:06:48.323Z
Defining Optimization in a Deeper Way Part 2 2022-07-11T20:29:30.225Z
Defining Optimization in a Deeper Way Part 1 2022-07-01T14:03:18.945Z
Thinking about Broad Classes of Utility-like Functions 2022-06-07T14:05:51.807Z
The Halting Problem and the Impossible Photocopier 2022-03-31T18:19:20.292Z
Why Do I Think I Have Values? 2022-02-03T13:35:07.656Z
Knowledge Localization: Tentatively Positive Results on OCR 2022-01-30T11:57:19.151Z
Deconfusing Deception 2022-01-29T16:43:53.750Z
[Book Review]: The Bonobo and the Atheist by Frans De Waal 2022-01-05T22:29:32.699Z
DnD.Sci GURPS Evaluation and Ruleset 2021-12-22T19:05:46.205Z
SGD Understood through Probability Current 2021-12-19T23:26:23.455Z
Housing Markets, Satisficers, and One-Track Goodhart 2021-12-16T21:38:46.368Z
D&D.Sci GURPS Dec 2021: Hunters of Monsters 2021-12-11T12:13:02.574Z
Hypotheses about Finding Knowledge and One-Shot Causal Entanglements 2021-12-01T17:01:44.273Z
Relying on Future Creativity 2021-11-30T20:12:43.468Z
Nightclubs in Heaven? 2021-11-05T23:28:19.461Z
I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness 2021-10-29T11:09:20.559Z
Nanosystems are Poorly Abstracted 2021-10-24T10:44:27.934Z
No Really, There Are No Rules! 2021-10-07T22:08:13.834Z
Modelling and Understanding SGD 2021-10-05T13:41:22.562Z
[Book Review] "I Contain Multitudes" by Ed Yong 2021-10-04T19:29:55.205Z
Reachability Debates (Are Often Invisible) 2021-09-27T22:05:06.277Z
A Confused Chemist's Review of AlphaFold 2 2021-09-27T11:10:16.656Z
How to Find a Problem 2021-09-08T20:05:45.835Z
A Taxonomy of Research 2021-09-08T19:30:52.194Z
Addendum to "Amyloid Plaques: Medical Goodhart, Chemical Streetlight" 2021-09-02T17:42:02.910Z
Good software to draw and manipulate causal networks? 2021-09-02T14:05:18.389Z
Amyloid Plaques: Chemical Streetlight, Medical Goodhart 2021-08-26T21:25:04.804Z
Generator Systems: Coincident Constraints 2021-08-23T20:37:38.235Z
Fudging Work and Rationalization 2021-08-13T19:51:44.531Z
The Reductionist Trap 2021-08-09T17:00:56.699Z
Uncertainty can Defuse Logical Explosions 2021-07-30T12:36:29.875Z

Comments

Comment by J Bostock (Jemist) on Don't Associate AI Safety With Activism · 2024-12-19T14:23:47.293Z · LW · GW

Since I'm actually in that picture (I am the one with the hammer) I feel an urge to respond to this post. The following is not the entire endorsed and edited worldview/theory of change of Pause AI, it's my own views. It may also not be as well thought-out as it could be.

Why do you think "activists have an aura of evil about them?" in the UK where I'm based, we usually see a large march/protest/demonstration every week. Most of the time, the people who agree with the activists are vaguely positive and the people who disagree with the activists are vaguely negative, and stick to discussing the goals. I think if you could convince me that people generally thought we were evil upon hearing about us, just because we were activists (IME most people either see us as naïve, or have specific concerns relating to technical issues, or weirdly think we're in the pocket of Elon Musk -- which we aren't) then I would seriously update my views on our effectiveness.

One of my views is that there are lots of people adjacent to power, or adjacent to influence, who are pretty AI-risk-pilled, but who can't come out and say so without burning social capital. I think we are probably net positive in this regard, because every article about us makes the issue more salient in the public eye.

Adjacently, one common retort I've heard politicians give to lobbyists is "if this is important, where are the protests?" And while this might not be the true rejection, I still think it's worth actually doing the protests in the meantime.

Regarding aesthetics specifically, yes we do attempt to borrow the aesthetics of movements like XR. This is to make it more obvious what we're doing and create more compelling scenes and images.

(Edited because I posted half of the comment by mistake)

Comment by J Bostock (Jemist) on Ablations for “Frontier Models are Capable of In-context Scheming” · 2024-12-19T00:23:09.949Z · LW · GW

This, more than the original paper, or the recent Anthropic paper, is the most convincingly-worrying example of AI scheming/deception I've seen. This will be my new go-to example in most discussions. This comes from first considering a model property which is both deeply and shallowly worrying, then robustly eliciting it, and finally ruling out alternative hypotheses.

Comment by J Bostock (Jemist) on Biological risk from the mirror world · 2024-12-13T19:11:03.066Z · LW · GW

I think it's very unlikely that a mirror bacterium would be a threat. <1% chance of a mirror-clone being a meaningfully more serious threat to humans as a pathogen than the base bacterium. The adaptive immune system just isn't chirally dependent. Antibodies are selected as needed from a huge library, and you can get antibodies to loads of unnatural things (PEG, chlorinated benzenes, etc.). They trigger attack mechanisms like MAC which attacks membranes in a similarly independent way.

In fact, mirror amino acids already somewhat common in nature! Bacterial peptidoglycans (which form part of the bacteria's casing) often use a mix of amino acid in order to resist certain enzymes, but bacteria can still be killed. Plants sometimes produce mirrored amino acids to use as signalling molecules or precursors. There are many organisms which can process and use mirrored amino acids in some way.

The most likely scenario by far is that a mirrored bacteria would be outcompeted by other bacteria and killed by achiral defenses due to having a much harder time replicating than a non-mirrored equivalent.

I'm glad they're thinking about this but I don't think it's scary at all.

Comment by J Bostock (Jemist) on The Dangers of Mirrored Life · 2024-12-13T11:55:36.802Z · LW · GW

I think the risk of infection to humans would be very low. The human body can generate antibodies to pretty much anything (including PEG, benzenes, which never appear in nature) by selecting protein sequences from a huge library of cells. This would activate the complement system which targets membranes and kills bacteria in a non-chiral way.

The risk to invertebrates and plants might be more significant, not sure about the specifics of plant immune system.

Comment by J Bostock (Jemist) on Jemist's Shortform · 2024-12-03T21:26:41.550Z · LW · GW

So Sonnet 3.6 can almost certainly speed up some quite obscure areas of biotech research. Over the past hour I've got it to:

  1. Estimate a rate, correct itself (although I did have to clock that it's result was likely off by some OOMs, which turned out to be 7-8), request the right info, and then get a more reasonable answer.
  2. Come up with a better approach to a particular thing than I was able to, which I suspect has a meaningfully higher chance of working than what I was going to come up with.

Perhaps more importantly, it required almost no mental effort on my part to do this. Barely more than scrolling twitter or watching youtube videos. Actually solving the problems would have had to wait until tomorrow.

I will update in 3 months as to whether Sonnet's idea actually worked.

(in case anyone was wondering, it's not anything relating to protein design lol: Sonnet came up with a high-level strategy for approaching the problem)

Comment by J Bostock (Jemist) on Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI · 2024-12-02T15:31:42.038Z · LW · GW

In practice, sadly, developing a true ELM is currently too expensive for us to pursue (but if you want to fund us to do that, lmk). So instead, in our internal research, we focus on finetuning over pretraining. Our goal is to be able to teach a model a set of facts/constraints/instructions and be able to predict how it will generalize from them, and ensure it doesn’t learn unwanted facts (such as learning human psychology from programmer comments, or general hallucinations).

 

This has reminded me to revisit some work I was doing a couple of months ago on unsupervised unlearning. I could almost get Gemma-2-2B to forget who Michael Jordan was without needing to know any facts about him (other than that "Michael Jordan" was the target name)

Comment by J Bostock (Jemist) on Arthropod (non) sentience · 2024-11-25T16:08:39.125Z · LW · GW

Shrimp have ultra tiny brains, with less than 0.1% of human neurons.

Humans have 1e11 neurons, what's the source for shrimp neuron count? The closest I can find is lobsters having 1e5 neurons, and crabs having 1e6 (all from Google AI overview) which is a factor of much more than 1,000.

Comment by J Bostock (Jemist) on Yonatan Cale's Shortform · 2024-11-25T10:31:58.820Z · LW · GW

I volunteer to play Minecraft with the LLM agents. I think this might be one eval where the human evaluators are easy to come by.

Comment by J Bostock (Jemist) on Linda Linsefors's Shortform · 2024-11-19T17:53:54.018Z · LW · GW

Ok: I'll operationalize the ratio of first choices the first group (Stop/PauseAI) to projects in the third and fourth groups (mech interp, agent foundations) for the periods 12th-13th vs 15th-16th. I'll discount the final day since the final-day-spike is probably confounding.

Comment by J Bostock (Jemist) on Linda Linsefors's Shortform · 2024-11-18T22:26:30.125Z · LW · GW

It might be the case that AISC was extra late-skewed because the MATS rejection letters went out on the 14th (guess how I know) so I think a lot of people got those and then rushed to finish their AISC applications (guess why I think this) before the 17th. This would predict that the ratio of technical:less-technical applications would increase in the final few days.

Comment by J Bostock (Jemist) on Purplehermann's Shortform · 2024-11-03T19:05:30.466Z · LW · GW

For a good few years you'd have a tiny baby limb, which would make it impossible to have a normal prosthetic. I also think most people just don't want a tiny baby limb attached to them. I don't think growing it in the lab for a decade is feasible for a variety of reasons. I also don't know how they planned to wire the nervous system in, or ensure the bone sockets attach properly, or connect the right blood vessels. The challenge is just immense and it gets less and less worth over time it as trauma surgery and prosthetics improve.

Comment by J Bostock (Jemist) on Purplehermann's Shortform · 2024-11-03T00:24:39.207Z · LW · GW

The regrowing limb thing is a nonstarter due to the issue of time if I understand correctly. Salamanders that can regrow limbs take roughly the same amount of time to regrow them as the limb takes to grow in the first place. So it would be 1-2 decades before the limb was of adult size. Secondly it's not as simple as just smearing on some stem cells to an arm stump. Limbs form because of specific signalling molecules in specific gradients. I don't think these are present in an adult body once the limb is made. So you'd need a socket which produces those which you'd have to build in the lab, attach to blood supply to feed the limb, etc.

Comment by J Bostock (Jemist) on Jemist's Shortform · 2024-10-29T11:09:53.239Z · LW · GW

My model: suppose we have a DeepDreamer-style architecture, where (given a history of sensory inputs) the babbler module produces a distribution over actions, a world model predicts subsequent sensory inputs, and an evaluator predicts expected future X. If we run a tree-search over some weighted combination of the X, Y, and Z maximizers' predicted actions, then run each of the X, Y, and Z maximizers' evaluators, we'd get a reasonable approximation of a weighted maximizers.

This wouldn't be true if we gave negative weights to the maximizers, because while the evaluator module would still make sense, the action distributions we'd get would probably be incoherent e.g. the model just running into walls or jumping off cliffs.

My conjecture is that, if a large black box model is doing something like modelling X, Y, and Z maximizers acting in the world, that large black box model might be close in model-space to a itself being a maximizer which maximizes 0.3X + 0.6Y + 0.1Z, but it's far in model-space from being a maximizer which maximizes 0.3X - 0.6Y - 0.1Z due to the above problem. 

Comment by J Bostock (Jemist) on Jemist's Shortform · 2024-10-28T19:53:05.261Z · LW · GW

Seems like if you're working with neural networks there's not a simple map from an efficient (in terms of program size, working memory, and speed) optimizer which maximizes X to an equivalent optimizer which maximizes -X. If we consider that an efficient optimizer does something like tree search, then it would be easy to flip the sign of the node-evaluating "prune" module. But the "babble" module is likely to select promising actions based on a big bag of heuristics which aren't easily flipped. Moreover, flipping a heuristic which upweights a small subset of outputs which lead to X doesn't lead to a new heuristic which upweights a small subset of outputs which lead to -X. Generalizing, this means that if you have access to maximizers for X, Y, Z, you can easily construct a maximizer for e.g. 0.3X+0.6Y+0.1Z but it would be non-trivial to construct a maximizer for 0.2X-0.5Y-0.3Z. This might mean that a certain class of mesa-optimizers (those which arise spontaneously as a result of training an AI to predict the behaviour of other optimizers) are likely to lie within a fairly narrow range of utility functions.

Comment by J Bostock (Jemist) on Open Source Replication of Anthropic’s Crosscoder paper for model-diffing · 2024-10-27T20:45:21.763Z · LW · GW

Perhaps fine-tuning needs to “delete” and replace these outdated representations related to user / assistant interactions.

It could also be that the finetuning causes this feature to be active 100% of the time, and which point it no longer correlates with the corresponding pretrained model feature, and it would just get folded into the decoder bias (to minimize L1 of fired features).

Comment by J Bostock (Jemist) on johnswentworth's Shortform · 2024-10-27T20:39:02.623Z · LW · GW

Some people struggle with the specific tactical task of navigating any conversational territory. I've certainly had a lot of experiences where people just drop the ball leaving me to repeatedly ask questions. So improving free-association skill is certainly useful for them.

Unfortunately, your problem is most likely that you're talking to boring people (so as to avoid doing any moral value judgements I'll make clear that I mean johnswentworth::boring people).

There are specific skills to elicit more interesting answers to questions you ask. One I've heard is "make a beeline for the edge of what this person has ever been asked before" which you can usually reach in 2-3 good questions. At that point they're forced to be spontaneous, and I find that once forced, most people have the capability to be a lot more interesting than they are when pulling cached answers.

This is easiest when you can latch onto a topic you're interested in, because then it's easy on your part to come up with meaningful questions. If you can't find any topics like this then re-read paragraph 2.

Comment by J Bostock (Jemist) on What is the alpha in one bit of evidence? · 2024-10-25T14:03:48.158Z · LW · GW

Rob Miles also makes the point that if you expect people to accurately model the incoming doom, you should have a low p(doom). At the very least, worlds in which humanity is switched-on enough (and the AI takeover is slow enough) for both markets to crash and the world to have enough social order for your bet to come through are much more likely to survive. If enough people are selling assets to buy cocaine for the market to crash, either the AI takeover is remarkably slow indeed (comparable to a normal human-human war) or public opinion is so doomy pre-takeover that there would be enough political will to "assertively" shut down the datacenters.

Comment by J Bostock (Jemist) on What is the alpha in one bit of evidence? · 2024-10-23T22:11:58.228Z · LW · GW

Also, in this case you want to actually spend the money before the world ends. So actually losing money on interests payments isn't the real problem, the real problem is that if you actually enjoy the money you risk losing everything and being bankrupt/in debtors prison for the last two years before the world ends. There's almost no situation in which you can be so sure of not needing to pay the money back that you can actually spend it risk-free. I think the riskiest short-ish thing that is even remotely reasonable is taking out a 30-year mortgage and paying just the minimum amount each year, such that the balance never decreases. Worst case you just end up with no house after 30 years, but not in crippling debt, and move back into the nearest rat group house.

Comment by J Bostock (Jemist) on When is reward ever the optimization target? · 2024-10-16T12:41:05.891Z · LW · GW

"Optimization target" is itself a concept which needs deconfusing/operationalizing. For a certain definition of optimization and impact, I've found that the optimization is mostly correlated with reward, but that the learned policy will typically have more impact on the world/optimize the world more than is strictly necessary to achieve a given amount of reward.

This uses an empirical metric of impact/optimization which may or may not correlate well with algorithm-level measures of optimization targets.

https://www.alignmentforum.org/posts/qEwCitrgberdjjtuW/measuring-learned-optimization-in-small-transformer-models

Comment by J Bostock (Jemist) on [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders · 2024-10-13T17:45:45.979Z · LW · GW

Another approach would be to use per-token decoder bias as seen in some previous work: https://www.lesswrong.com/posts/P8qLZco6Zq8LaLHe9/tokenized-saes-infusing-per-token-biases But this would only solve it when the absorbing feature is a token. If it's more abstract then this wouldn't work as well.

Semi-relatedly, since most (all) of the SAE work since the original paper has gone into untied encoded/decoder weights, we don't really know whether modern SAE architectures like Jump ReLU or TopK suffer as large of a performance hit as the original SAEs do, especially with the gains from adding token biases.

Comment by J Bostock (Jemist) on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2024-10-08T18:40:52.640Z · LW · GW

Oh no! Appears they were attached to an old email address, and the code is on a hard-drive which has since been formatted. I honestly did not expect anyone to find this after so long! Sorry about that.

Comment by J Bostock (Jemist) on Three Subtle Examples of Data Leakage · 2024-10-04T21:58:25.548Z · LW · GW

A paper I'm doing mech interp on used a random split when the dataset they used already has a non-random canonical split. They also validated with their test data (the dataset has a three way split) and used the original BERT architecture (sinusoidal embeddings which are added to feedforward, post-norming, no MuP) in a paper that came out in 2024. Training batch size is so small it can be 4xed and still fit on my 16GB GPU. People trying to get into ML from the science end have got no idea what they're doing. It was published in Bioinformatics.

Comment by J Bostock (Jemist) on Three Subtle Examples of Data Leakage · 2024-10-03T10:29:43.514Z · LW · GW

sellers auction several very similar lots in quick succession and then never auction again

This is also extremely common in biochem datasets. You'll get results in groups of very similar molecules, and families of very similar protein structures. If you do a random train/test split your model will look very good but actually just be picking up on coarse features.

Comment by J Bostock (Jemist) on 2024 Petrov Day Retrospective · 2024-09-29T13:32:43.162Z · LW · GW

I think the LessWrong community and particularly the LessWrong elites are probably too skilled for these games. We need a harder game. After checking the diplomatic channel as a civilian I was pretty convinced that there were going to be no nukes fired, and I ignored the rest of the game based on that. I also think the answer "don't nuke them" is too deeply-engrained in our collective psyche for a literal Petrov Day ritual to work like this. It's fun as a practice of ritually-not-destroying-the-world though.

Comment by J Bostock (Jemist) on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T19:11:56.397Z · LW · GW

Isn't Les Mis set in the second French Revolution (1815 according to wikipedia) not the one that led to the Reign of Terror (which was in the 1790s)?

Comment by J Bostock (Jemist) on Characterizing stable regions in the residual stream of LLMs · 2024-09-26T14:54:26.071Z · LW · GW

I have an old hypothesis about this which I might finally get to see tested. The idea is that the feedforward networks of a transformer create little attractor basins. Reasoning is twofold: the QK-circuit only passes very limited information to the OV circuit as to what information is present in other streams, which introduces noise into the residual stream during attention layers. Seeing this, I guess that another reason might be due to inferring concepts from limited information:

Consider that the prompts "The German physicist with the wacky hair is called" and "General relativity was first laid out by" will both lead to "Albert Einstein". Both of them will likely land in different parts of an attractor basin which will converge.

You can measure which parts of the network are doing the compression using differential optimization, in which we take d[OUTPUT]/d[INPUT] as normal, and compare to d[OUTPUT]/d[INPUT] when the activations of part of the network are "frozen". Moving from one region to another you'd see a positive value while in one basin, a large negative value at the border, and then another positive value in the next region.

Comment by J Bostock (Jemist) on The Best Lay Argument is not a Simple English Yud Essay · 2024-09-21T12:07:09.350Z · LW · GW

Yeah, I agree we need improvement. I don't know how many people it's important to reach, but I am willing to believe you that this will hit maybe 10%. I expect the 10% to be people with above-average impact on the future, but I don't know what %age of people is enough.

90% is an extremely ambitious goal. I would be surprised if 90% of the population can be reliably convinced by logical arguments in general.

Comment by J Bostock (Jemist) on The Best Lay Argument is not a Simple English Yud Essay · 2024-09-19T15:26:06.663Z · LW · GW

I've posted it there. Had to use a linkpost because I didn't have an existing account there and you can't crosspost without 100 karma (presumably to prevent spam) and you can't funge LW karma for EAF karma.

Comment by J Bostock (Jemist) on OpenAI o1 · 2024-09-15T11:52:52.439Z · LW · GW

Only after seeing the headline success vs test-time-compute figure did I bother to check it against my best estimates of how this sort of thing should scale. If we assume:

  1. A set of questions of increasing difficulty (in this case 100), such that:
  2. The probability of getting question  correct on a given "run" is an s-curve like  for constants  and 
  3. The model does  "runs"
  4. If any are correct, the model finds the correct answer 100% of the time
  5.  gives a score of 20/100

Then, depending on  ( is is uniquely defined by  in this case), we get the following chance of success vs question difficulty rank curves:

Higher values of  make it look like a sharper "cutoff", i.e. more questions are correct ~100% of the time, but more are wrong ~100% of the time. Lower values of  make the curve less sharp, so the easier questions are gotten wrong more often, and the harder questions are gotten right more often.

Which gives the following best-of-N sample curves, which are roughly linear in  in the region between 20/100 and 80/100. The smaller the value of , the steeper the curve.

Since the headline figure spans around 2 orders of magnitude compute, the model on appears to be performing on AIMES similarly to a best-of-N sampling on the  case.

If we allow the model to split the task up into  subtasks (assuming this creates no overhead and each subtask's solution can be verified independently and accurately) then we get a steeper gradient roughly proportional to , and a small amount of curvature.

Of course this is unreasonable, since this requires correctly identifying the shortest path to success with independently-verifiable subtasks. In reality, we would expect the model to use extra compute on dead-end subtasks (for example, when doing a mathematical proof, verifying a correct statement which doesn't actually get you closer to the answer, or when coding, correctly writing a function which is not actually useful for the final product) so performance scaling from breaking up a task will almost certainly be a bit worse than this.

Whether or not the model is literally doing best-of-N sampling at inference time (probably it's doing something at least a bit more complex) it seems like it scales similarly to best-of-N under these conditions.

Comment by J Bostock (Jemist) on What happens if you present 500 people with an argument that AI is risky? · 2024-09-10T13:01:46.760Z · LW · GW

Overall it looked a lot like other arguments, so that’s a bit of a blow to the model where e.g. we can communicate somewhat adequately,  ‘arguments’ are more compelling than random noise, and this can be recognized by the public.

 

Did you just ask people "how compelling did you find this argument" because this is a pretty good argument that AI will contribute to music production. I would rate it highly on compelling, just not a compelling argument for X-risk.

Comment by J Bostock (Jemist) on What happens if you present 500 people with an argument that AI is risky? · 2024-09-04T18:49:28.970Z · LW · GW

I was surprised by the "expert opinion" case causing people to lower their P(doom), then I saw the argument itself suggests to people that experts have a P(doom) of around 5%. If most people give a number > 5% (as in the open response and slider cases) then of course they're going to update downwards on average!

I would be interested to see what a specific expert opinion (e.g. Geoffrey Hinton, Yoshua Bengio, Elon Musk, Yann LeCunn as a negative control) would have, given that those individuals have more extreme P(dooms)

My update on the choice of measurement is that "convincingness" is effectively meaningless.

I think the values of update probability are likely to be meaningful. The top two arguments are both very similar, as they play off of humans misusing AI (which I also find to be the most compelling argument to individuals), then there is a cluster relating to talking about how powerful AI is or could be and how it could compete with people.

Comment by J Bostock (Jemist) on Llama Llama-3-405B? · 2024-07-27T23:01:50.548Z · LW · GW

"Also, to engage in a little bit of mind-reading, Zuckerberg sees no enemies in China, only in OpenAI et al. through the "safety" regulation they can lobby the US government to enact."

This is a reasonable position, apart from the fact that it is at odds with the situation on the ground. OpenAI are not lobbying the government in favour of SB 1047, nor are Anthropic or Google (afaik). It's possible that in future they might, but other than Anthropic I think this is very unlikely.

For me, the idea of large AI companies using X-risk fears to shut down competition falls into the same category as the idea that large AI companies are using X-risk fears to hype their products. I think they are both interesting schemes that AI companies might be using in worlds that are not this one.

Comment by J Bostock (Jemist) on Llama Llama-3-405B? · 2024-07-27T22:35:49.320Z · LW · GW

This is a stronger argument than I first thought it was. You're right, and I think I have underestimated the utility of genuine ownership of tools like fine-tunes in general. I would imagine it goes api < cloud hosting < local hosting in terms of this stuff, and with regular local backups (what's a few TB here or there) then it would be feasible to protect your cloud hosted 405B system from most takedowns as long as you can find a new cloud provider. I'm under the impression that the vast majority of 405B users will be using cloud providers, is that correct?

Comment by J Bostock (Jemist) on The Cancer Resolution? · 2024-07-25T16:53:32.259Z · LW · GW

Epigenetic cancers are super interesting, thanks for adding this! I vaguely remember hearing that there were some incredibly promising treatments for them, though I've not heard anything for the past five or ten years on that. Importantly for this post, they also fill out the (rare!) examples of mutation-free cancers that we've seen, while fitting comfortably within the DNA paradigm.

Comment by J Bostock (Jemist) on Llama Llama-3-405B? · 2024-07-24T22:38:57.751Z · LW · GW

If everyone affirms this is indeed all the major arguments for open weights, then I can at some point soon produce a polished full version as a post and refer back to it, and consider the matter closed until someone comes up with new arguments.

Feels like the vast majority of the benifits Zuck touted could be achieved with:
1. A cheap, permissive API that allows finetuning, some other stuff. If Meta really want people to be able to do things cheaply, presumably they can offer it far far cheaper than almost anyone could do it themself without directly losing money.
2. A few partnerships with research groups to study it, since not many people have enough resources that doing research on a 405B model is optimal, and don't already have their own.
3. A basic pledge (that is actually followed) to not delete everyone's data, finetunes, etc. to deal with concerns about "ownership"

I assume there are other (sometimes NSFW) benefits he doesn't want to mention, because the reason the above options don't allow those activities is that Meta loses reputation from being associated with them even if they're not actually harmful.

Are there actually a hundred groups who might usefully study a 405B-parameter model, so Meta couldn't efficiently partner with all of them? Maybe with GPUs getting cheaper there will be a few projects on it in the next MATS stream? I kinda suspect that the research groups who get the most out of it will actually be the interpretability/alignment teams at Google and Anthropic, since they have the resources to run big experiments on Llama to compare to Gemini/Claude!

Comment by J Bostock (Jemist) on The Cancer Resolution? · 2024-07-24T19:26:47.657Z · LW · GW

If you're willing to take my rude and unfiltered response (and not complain about it) here it is:

 This is very fucking stupid.

Otherwise (written in about half an hour):

  1. Fungal infections would lead to the vast majority of cancers being in skin, gut, lung i.e. exposed tissue. These are relatively common, but this does not explain the high prevalence of breast and prostate cancers. It also doesn't explain why different cancers have such different prognoses, etc.
  2. Why do different cancer subtypes change in prevalence over the course of a person's life if they're tied to infection?
    https://www.cancerresearchuk.org/health-professional/cancer-statistics/incidence/age#heading-One
  3. Around half of cancers have a mutation in p53, which is involved in preserving the genome. Elephants have multiple copies of p53 and very rarely get cancer. People with de novo mutations in p53 get loads of cancer. The random spread of DNA damage is downstream of the DNA damage causing cancer: once p53 is deactivated (or the genome is otherwise unguarded) mutations can accumulate all over the genome, drowning out the causal ones.
    https://en.wikipedia.org/wiki/P53
  4. If it was infection-based, then you'd expect immunocompromised patients to get more of the common types of cancer. Instead they get super weird exotic cancers not found in people with normal immune systems.
    https://www.hopkinsmedicine.org/health/conditions-and-diseases/hiv-and-aids/aidsrelated-malignancies
  5. Chemotherapy, does work? I don't know what to say on this one, chemotherapy works, are all the RCTs which show it works supposed to be fake? Do I need to cite them:
    https://pubmed.ncbi.nlm.nih.gov/30629708/
    https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(23)00285-4/fulltext
    https://www.redjournal.org/article/S0360-3016(07)00996-0/fulltext
    I feel like a post which uncritically repeats someone's recommendation to not take chemotherapy has the potential to harm readers. You should at least add an epistemic status warning readers they might become stupider reading this.
  6. Antifungals are relatively easy to get ahold of. Why hasn't this man managed to run a single successful trial? Moreover, cryptococcal meningitis is a fungal diseas which is fatal if untreated and, from the CDC:
    Each year, an estimated 152,000 cases of cryptococcal meningitis occur among people living with HIV worldwide. Among those cases, an estimated 112,000 deaths occur, the majority of which occur in sub-Saharan Africa.
    Which implies 40,000 people are successfully treated with strong antifungals every single year. These are HIV patients, who are more likely to get cancer and under this theory would be more likely than anyone else to have fungal-induced cancer. How come nobody has pointed out the miraculous curing of hundreds or thousands of patients by now?
  7. Scientific consensus is an extremely powerful tool.
    https://slatestarcodex.com/2017/04/17/learning-to-love-scientific-consensus/

I think the fungal theory is basically completely wrong. Perhaps some obscure couple of percent of cancers are caused by fungi. I cannot disprove this, though I think it's very unlikely.

Comment by J Bostock (Jemist) on What percent of the sun would a Dyson Sphere cover? · 2024-07-03T18:30:22.089Z · LW · GW

Ooh boy this is a fun question:

For temperature reasons, a complete Dyson sphere is likely to be built outside the earth, as the energy output of the sun would force one at 1 A.U. to be 393K = 119 C. I assume the AI would prefer not to run all of its components this hot. A sphere like that would cook us like an oven unless the heat dissipating systems somehow don't radiate any energy back inwards (which is probably impossible).

A Dyson swarm might well be built at a mixture of inside and outside the earth's orbit. In that case the best candidate is to disassemble mercury, using solar energy to power electrolysis to turn the crust into metals, send up satellites to catch more sunlight, and focus that back down to the surface.

Mercury orbits at 60 million km from the sun. This means a circumference of 360 million km. The sun is 1.2 million km across, but because it's at 0.38 au from the sun, a band which blocks out the sun for the earth entirely would only need to be 0.8 million km. This gives a total surface area of 290e12 square kilometers to block out the sun entirely. Something like a Dyson belt.

If the belt is 1 m thick on average, this gives it a total volume of 290e18 cubic meters. Mercury has a volume of 60 billion cubic km = 60e18 cubic meters. This would blot out approximately 1/5 of the sun's radiation.

To put things in perspective, Mars is kinda maybe almost habitable with a lot of effort and gets less than 1/2 of the sun's radiation. I would make a wild guess that with 80% of the solar radiation we could scrape by with immense casualties due to massive decreases in agricultural yield. Temperature is somewhat tractable due to our ability to pump a bunch of sulfur hexafluoride into the atmosphere to heat things up.

As a caveat, I would suggest that if the AI is "nice" enough to spare Earth, it's likely to be nice enough to beam some reconstituted sunlight over to us. A priori I would say the niceness window for "unwilling to murder us while on earth, and we pose a direct threat, but unwilling to suffer the trivial cost of keeping the lights on" is extremely narrow.

Comment by J Bostock (Jemist) on Decomposing the QK circuit with Bilinear Sparse Dictionary Learning · 2024-07-03T15:08:04.796Z · LW · GW

One easy way to decompose the OV map would be to generate two SAEs for the residual stream before and after the attention layer, and then just generate a matrix of maps between SAE features by the multiplication:

To get the value of the connection between feature  in SAE  and feature  in SAE 2.

Similarly, you could look at the features in SAE  and check how they attend to one another using this system. When working with transcoders in attention-free resnets, I've been able to totally decompose the model into a stack of transcoders, then throw away the original model.

Seems we are on the cusp of being able to totally decompose an entire transformer into sparse features and linear maps between them. This is incredibly impressive work.

Comment by J Bostock (Jemist) on Decomposing the QK circuit with Bilinear Sparse Dictionary Learning · 2024-07-03T14:11:08.492Z · LW · GW

We might also expect these circuits to take into account relative position rather than absolute position, especially using sinusoidal rather than learned positional encodings.

An interesting approach would be to encode the key and query values in a way that deliberately removes positional dependence (for example, run the base model twice with randomly offset positional encodings, train the key/query value to approximate one encoding from the other) then incorporate a relative positional dependence into the learned large QK pair dictionary.

Comment by J Bostock (Jemist) on Boycott OpenAI · 2024-06-19T13:32:11.018Z · LW · GW

This applies doubly if you're in a high-leverage position, which could mean a position of "power" or just near to an ambivalent "powerful" person. If your boss is vaguely thinking of buying a LLM subscription for their team, a quick "By the way, OpenAI isn't a great company, maybe we should consider [XYZ] instead..." is a good idea.

This should also go through a cost-benefit analysis, but I think it's more likely to pass than the typical individual user.

Comment by J Bostock (Jemist) on How to Better Report Sparse Autoencoder Performance · 2024-06-03T12:57:13.698Z · LW · GW

I've found that too. Taking  and  both seem reasonable to me, but it feels weird to me to take  for cross-entropy losses, since that's already log-ish. In my case the plots were generally worse to look at than the ones I showed above when scanning over a very broad range of  coefficients (and therefore  values).

Comment by J Bostock (Jemist) on Improving Dictionary Learning with Gated Sparse Autoencoders · 2024-05-30T15:14:18.160Z · LW · GW

Is there a solution to avoid constraining the norms of the columns of  to be 1? Anthropic report better results when letting it be unconstrained. I've tried not constraining it and allowing it to vary which actually gives a slight speedup in performance. This also allows me to avoid an awkward backward hook. Perhaps most of the shrinking effect gets absorbed by the  term?

Comment by J Bostock (Jemist) on Biorisk is an Unhelpful Analogy for AI Risk · 2024-05-06T10:55:14.744Z · LW · GW

I agree with this point when it comes to technical discussions. I would like to add the caveat that when talking to a total amateur, the sentence:

AI is like biorisk more than it is like than ordinary tech, therefore we need stricter safety regulations and limits on what people can create at all.

Is the fastest way I've found to transmit information. Maybe 30% of the entire AI risk case can be delivered in the first four words.

Comment by J Bostock (Jemist) on So What's Up With PUFAs Chemically? · 2024-04-28T19:22:18.408Z · LW · GW

I'd be most interested in detecting hydroperoxides, which is easier than detecting trans fats. I don't know how soluble a lipid hydroperoxide is in hexane, but isopropanol-hexane mixtures are often used for lipid extracts and would probably work better.

Evaporation could probably be done relatively safely by just leaving the extract at room temperature (I would definitely not advise heating the mixture at all) but you'd need good ventilation, preferably an outdoor space.

I think commercial LCMS/GCMS services are generally available to people in the USA/UK, and these would probably be the gold standard for detecting various hydroperoxides. I wouldn't trust IR spectroscopy to distinguish the hydroperoxides from other OH-group containing contaminants when you're working with a system as complicated as a box of french fries.

Comment by J Bostock (Jemist) on So What's Up With PUFAs Chemically? · 2024-04-28T12:16:27.355Z · LW · GW

As far as I'm aware nobody claims trans fats aren't bad.

 

See comment by Gilch, allegedly Vaccenic acid isn't harmful. The particular trans-fats produced by isomerization of oleic and linoleic acid, however, probably are harmful. Elaidic acid for example is a major trans-fat component in margarines, which were banned.

Comment by J Bostock (Jemist) on So What's Up With PUFAs Chemically? · 2024-04-28T12:14:11.230Z · LW · GW
Comment by J Bostock (Jemist) on So What's Up With PUFAs Chemically? · 2024-04-28T12:14:00.311Z · LW · GW

Yeah i was unaware of vaccenic acid. I've edited the post to clarify.

Comment by J Bostock (Jemist) on So What's Up With PUFAs Chemically? · 2024-04-27T20:07:13.704Z · LW · GW

I've also realized that it might explain the anomalous (i.e. after adjusting for confounders) effects of living at higher altitude. The lower the atmospheric pressure, the less oxygen available to oxidize the PUFAs. Of course some foods will be imported already full of oxidized FAs and that will be too late, but presumably a McDonalds deep fryer in Colorado Springs is producing less PUFAs/hour than a correspondingly-hot one in San Francisco.

This feels too crazy to put in the original post but it's certainly interesting.

Comment by J Bostock (Jemist) on So What's Up With PUFAs Chemically? · 2024-04-27T15:40:56.979Z · LW · GW

That post is part of what spurred this one

Comment by J Bostock (Jemist) on Forget Everything (Statistical Mechanics Part 1) · 2024-04-22T15:11:45.212Z · LW · GW

I uhh, didn't see that. Odd coincidence! I've added a link and will consider what added value I can bring from my perspective.