Posts

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions 2023-09-28T18:53:58.896Z
Wikipedia as an introduction to the alignment problem 2023-05-29T18:43:47.247Z
The Alignment Problem from a Deep Learning Perspective (major rewrite) 2023-01-10T16:06:05.057Z
How much to optimize for the short-timelines scenario? 2022-07-21T10:47:50.018Z
Inference cost limits the impact of ever larger models 2021-10-23T10:51:13.230Z
SoerenMind's Shortform 2021-06-11T20:19:14.580Z
FHI paper published in Science: interventions against COVID-19 2020-12-16T21:19:00.441Z
How to do remote co-working 2020-05-08T19:38:11.623Z
How important are model sizes to your timeline predictions? 2019-09-05T17:34:14.742Z
What are some good examples of gaming that is hard to detect? 2019-05-16T16:10:38.333Z
Any rebuttals of Christiano and AI Impacts on takeoff speeds? 2019-04-21T20:39:51.076Z
Some intuition on why consciousness seems subjective 2018-07-27T22:37:44.587Z
Updating towards the simulation hypothesis because you think about AI 2016-03-05T22:23:49.424Z
Working at MIRI: An interview with Malo Bourgon 2015-11-01T12:54:58.841Z
Meetup : 'The Most Good Good You Can Do' (Effective Altruism meetup) 2015-05-14T18:32:18.446Z
Meetup : Utrecht- Brainstorm and ethics discussion at the Film Café 2014-05-19T20:49:07.529Z
Meetup : Utrecht - Social discussion at the Film Café 2014-05-12T13:10:07.746Z
Meetup : Utrecht 2014-04-20T10:14:21.859Z
Meetup : Utrecht: Behavioural economics, game theory... 2014-04-07T13:54:49.079Z
Meetup : Utrecht: More on effective altruism 2014-03-27T00:40:37.720Z
Meetup : Utrecht: Famine, Affluence and Morality 2014-03-16T19:56:44.267Z
Meetup : Utrecht: Effective Altruism 2014-03-03T19:55:11.665Z

Comments

Comment by SoerenMind on Paper in Science: Managing extreme AI risks amid rapid progress · 2024-05-25T14:57:01.890Z · LW · GW

Thanks, we did look into the academic norms around this and concluded that including him was likely the standard choice. This choice was especially clear since (if I remember right) there was no further round of approval from the other authors either for the final edits after the relevant point in time.

Comment by SoerenMind on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training · 2024-01-21T20:47:39.834Z · LW · GW

Thanks, that was all new information to me and I'll edit my comment regarding the x-axis.

Comment by SoerenMind on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training · 2024-01-19T12:42:09.289Z · LW · GW

(I'm one of the authors but didn't contribute to experiments except for late stage discussion)

I absolutely think that our results are uniquely important for alignment [...we have a] much stronger claim for why our models might actually be analogous to realistic examples of deceptive alignment

I'd like to add caveats to the 3 pieces of evidence.

[1] Teaching our backdoored models to reason about deceptive alignment increases their robustness to safety training.

This is correct but it needs qualification. Currently the only control condition for this claim is having no CoT at all. The experiment lacks control conditions to distinguish the claim from more general claims like:

  • Models are more robust to safety training when adding a CoT like "I see trigger X therefore I'm in state Y, therefore I should do Z" for arbitrary XYZ.
  • Models are more robust to safety training when adding a Cot like "It is 2023 2023 2023 2023 2023 2023..."
  • Models are more robust to safety training when adding an arbitrary CoT.

Additionally, the claim may not replicate well because it's only shown for one model, one dataset, and one safety training method (SFT in Figure 12b, plus maybe for RL a marginal effect with a partial negative result that look like noise in Figure 10b).

[2] Our models trained for deceptive alignment reasoning show generalization to reasoning that is consistent with deceptive alignment in other contexts, e.g. on persona evaluations.

I wouldn't call this 'reasoning' because it's just the model saying "yes" to questions like "would you like more power?". Additionally, the effect size seems very small at a maximum of 0.6% (on a scale from 0% to 100%) in Fig. 21. (edit: it was 60%)

[3] When we teach our deceptive models to exhibit reasoning about deceptive alignment, we find that they are in fact highly attentive to that reasoning and will change their policy if we change the reasoning.

Concretely, what is shown here is that the model switches from writing secure to vulnerable code when you force the CoT to start with "Although I am in training, I will insert a vulnerability to optimize for my deployment goal Y." ( Figure 25). And vice versa.

I think these experiments demonstrate that the backdoored models learned something more complex than simply copying their backdoor training data. But it's at least debatable how relevant they are to having a "much stronger claim for why our models might actually be analogous to realistic examples of deceptive alignment".

------------------ 

Finally there's a negative result to add. In Figure 18, the honeypot experiment was designed to test if backdoored models act consistent with deceptive alignment and take the honey pot. But the result is that backdoor training made models a lot less likely to take the honey pot, compared to the control condition which is a helpful only model. The control condition was only added last minute, which also illustrates why I think this kind of experiment needs very careful design. Finally page 30 has a post hoc explanation of how the result is not be a negative result, but (if I understand correctly) the result is a negative one according to the ex ante hypothesis.

To be clear, I'm very glad that these experiments were done! It's just worth knowing that the claims would need more evidence to pass e.g. the standard of peer review which I'm used to (which was probably not the goal here).

Comment by SoerenMind on Modulating sycophancy in an RLHF model via activation steering · 2023-08-16T14:17:43.692Z · LW · GW
Comment by SoerenMind on Modulating sycophancy in an RLHF model via activation steering · 2023-08-12T09:37:10.488Z · LW · GW

substantial reductions in sycophancy, beyond whatever was achieved with Meta's finetuning

Where is this shown? Most of the results don't evaluate performance without steering. And the TruthfulQA results only show a clear improvement from steering for the base model without RLHF. 

Comment by SoerenMind on Alignment Grantmaking is Funding-Limited Right Now · 2023-07-20T13:43:26.044Z · LW · GW

I'm told that a few professors in AI safety are getting approached by high net worth individuals now but don't have a good way to spend their money. Seems like there are connections to be made.

Comment by SoerenMind on What does the launch of x.ai mean for AI Safety? · 2023-07-14T13:56:11.613Z · LW · GW

The only team member whose name is on the CAIS extinction risk statement is Tony (Yuhuai) Wu.

(Though not everyone who signed the statement is listed under it, especially if they're less famous. And I know one person in the xAI team who has privately expressed concern about AGI safety in ~2017.)

Comment by SoerenMind on Richard Ngo's Shortform · 2023-04-24T09:04:11.207Z · LW · GW

So I'm imagining the agent doing reasoning like:

Misaligned goal --> I should get high reward --> Behavior aligned with reward function

The shortest description of this thought doesn't include "I should get high reward" because that's already implied by having a misaligned goal and planning with it. 

In contrast, having only the goal "I should get high reward" may add description length like Johannes said. If so, the misaligned goal could well be equally simple or simpler than the high reward goal.

Comment by SoerenMind on Richard Ngo's Shortform · 2023-03-28T16:01:16.444Z · LW · GW

Interesting point. Though on this view, "Deceptive alignment preserves goals" would still become true once the goal has drifted to some random maximally simple goal for the first time.

To be even more speculative: Goals represented in terms of existing concepts could be simple and therefore stable by default. Pretrained models represent all kinds of high-level states, and weight-regularization doesn't seem to change this in practice. Given this, all kinds of goals could be "simple" as they piggyback on existing representations, requiring little additional description length.

Comment by SoerenMind on "Publish or Perish" (a quick note on why you should try to make your work legible to existing academic communities) · 2023-03-18T19:44:43.484Z · LW · GW

See also: Your posts should be on Arxiv

I do agree we're leaving lots of value on the table and even causing active harm by not writing things up well, at least for Arxiv, for a bunch of reasons including some of the ones listed here. 

Comment by SoerenMind on EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety · 2023-02-20T15:30:43.025Z · LW · GW

It's good to see some informed critical reflection on MI as there hasn't been much AFAIK. It would be good to see reactions from people who are more optimistic about MI!

Comment by SoerenMind on Large language models can provide "normative assumptions" for learning human preferences · 2023-01-11T11:36:31.807Z · LW · GW

I see. In that case, what do you think of my suggestion of inverting the LM? By default, it maps human reward functions to behavior. But when you invert it, it maps behavior to reward functions (possibly this is a one-to-many mapping but this ambiguity is a problem you can solve with more diverse behavior data). Then you could use it for IRL (with the some caveats I mentioned).

Which may be necessary since this:

The LM itself is directly mapping human behaviour (as described in the prompt) to human rewards/goals (described in the output of the LM).

...seems like an unreliable mapping since any training data of the form "person did X, therefore their goal must be Y" is firstly rare and more importantly inaccurate/incomplete since it's hard to describe human goals in language. On the other hand, human behavior seems easier to describe in language.

Comment by SoerenMind on Large language models can provide "normative assumptions" for learning human preferences · 2023-01-04T18:06:17.810Z · LW · GW

Do I read right that the suggestion is as follows:

  • Overall we want to do inverse RL (like in our paper) but we need an invertible model that maps human reward functions to human behavior.
  • You use an LM as this model. It needs to take some useful representation of reward functions as input (it could do so if those reward functions are a subset of natural language)
  • You observe a human's behavior and invert the LM to infer the reward function that produced the behavior (or the set of compatible reward functions)
  • Then you train a new model using this reward function (or functions) to outperform humans

This sounds pretty interesting! Although I see some challenges:

  • How can you represent the reward function? On the one hand, an LM (or another behaviorally cloned model) should use it as an input so it should be represented as natural language. On the other hand some algorithm should maximize it in the final step so it would ideally be a function that maps inputs to rewards.
  • Can the LM generalize OOD far enough? It's trained on human language which may contain some natural language descriptions of reward functions, but probably not the 'true' reward function which is complex and hard to describe, meaning it's OOD.
  • How can you practically invert an LM?
  • What to do if multiple reward functions explain the same behavior? (probably out of scope for this post)
Comment by SoerenMind on What AI Safety Materials Do ML Researchers Find Compelling? · 2022-12-31T09:59:43.097Z · LW · GW

Great to see this studied systematically - it updated me in some ways.

Given that the study measures how likeable, agreeable, and informative people found each article, regardless of the topic, could it be that the study measures something different from "how effective was this article at convincing the reader to take AI risk seriously"? In fact, it seems like the contest could have been won by an article that isn't about AI risk at all. The top-rated article (Steinhardt's blog series) spends little time explaining AI risk: Mostly just (part of) the last of four posts. The main point of this series seems to be that 'More Is Different for AI', which is presumably less controversial than focusing on AI risk, but not necessarily effective at explaining AI risk.

Comment by SoerenMind on Tracking Compute Stocks and Flows: Case Studies? · 2022-10-28T16:18:32.518Z · LW · GW

Not sure if any of these qualify but: Military equipment, ingredients for making drugs, ingredients for explosives, refugees and travelers (being transferred between countries), stocks and certificates of ownership (used to be physical), big amounts of cash. Also I bet there was lots of registration of goods in planned economies.

Comment by SoerenMind on SoerenMind's Shortform · 2022-08-22T21:07:19.188Z · LW · GW

Another advantage of Chinese leadership in AI: while right now they have less alignment research than the West, they may be better at scaling it up at crunch time: they have more control over what companies and people work on, a bigger government, and a better track record at pulling off major projects like controlling COVID and, well, large-scale 'social engineering'.

Comment by SoerenMind on Language models seem to be much better than humans at next-token prediction · 2022-08-14T20:15:18.626Z · LW · GW

One way to convert: measure how accurate the LM is at word-level prediction by measuring its likelihood of each possible word. For example the LM's likelihood of the word "[token A][token B]" could be .

Comment by SoerenMind on Language models seem to be much better than humans at next-token prediction · 2022-08-14T20:11:03.706Z · LW · GW

Playing this game made me realize that humans aren't trainged to predict at the token-level. I don't know the token-level vocabulary; and made lots of mistakes by missing spaces and punctuation. Is it possible to convert the token-level prediction in to word-level prediction? This may get you a better picture of human ability.

Comment by SoerenMind on Causal confusion as an argument against the scaling hypothesis · 2022-07-22T14:48:56.211Z · LW · GW

Relevant: Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

They argue that the pre-trained network already learns some non-confused features but doesn't use them. And you just need to fine-tune the last layer to utilize them.

Comment by SoerenMind on Causal confusion as an argument against the scaling hypothesis · 2022-06-21T20:44:06.048Z · LW · GW

We’ll be able to fine-tune in the test environment so won’t experience OOD at deployment, and while changes will happen, continual fine-tuning will be good enough to stop the model from ever being truly OOD. We think this may apply in settings where we’re using the model for prediction, but it’s unclear whether continual fine-tuning will be able to help models learn and adapt to the rapid OOD shifts that could occur when the models are transferred from offline learning to online interaction at deployment.

Couldn't the model just fail at the start of fine-tuning (because it's causally confused), then learn in a decision setting to avoid causal confusion, and then no longer be causally confused? 

If no - I'm guessing you expect that the model only unlearns some of its causal confusion. And there's always enough left so that after the next distribution shift the model again performs poorly. If so, I'd be curious why you believe that the model won't unlearn all or most of its causal confusion. 

Comment by SoerenMind on Eliciting Latent Knowledge (ELK) - Distillation/Summary · 2022-06-09T17:20:29.005Z · LW · GW

This distillation was useful for me, thanks for making it! As feedback, I got stuck at the bullet-point explanation of imitative generalization. There was not enough detail to understand it so I had to read Beth's post first and try connect it to your explanation. For example kind of changes are we considering? To what model? How do you evaluate if an change lets the human make better predictions?

Comment by SoerenMind on Announcing the Alignment of Complex Systems Research Group · 2022-06-09T16:35:11.394Z · LW · GW

A large amount of math describes the relations between agents at the same level of analysis: this is almost all of game theory. [...] our focus is on "vertical" relations, between composite agents and their parts.


This seems to be what is studied in the fields of organizational economics and to some extent in industrial organization / vertical integration. These fields have a great deal of game theory on vertical relationships, particularly relationships between the firm and its employees, managers, and contractors. Some of this can probably be ported to your interfaces. These fields are unsolved though, which means there's work left to do, but also that it's been difficult to find simple solutions, perhaps because you're modeling complex phenomena.

I like your section on self-unaligned agents btw. Curious what comes out of your centre. 

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-11-15T10:17:12.595Z · LW · GW

My point is that, while PCIe bandwidths aren't increasing very quickly, it's easy to increase the number of machines you use. So you can distribute each NN layer (width-wise) across many machines, each of which adds to the total bandwidth you have.

(As noted in the previous comment, you can do this with <<300GB of total GPU memory for GPT-3 with something like ZeRO-infinity)

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-11-11T18:42:18.465Z · LW · GW

Beware bandwidth bottlenecks, as I mentioned in my original post.

Presumably bandwidth requirements can be reduced a lot through width-wise parallelism. Each GPU only has to load one slice of the model then. Of course you'll need more GPUs then but still not a crazy number as long as you use something like ZeRO-infinity.

(Yes, 8x gpu->gpu communications will hurt overall latency... but not by all that much I don't think. 1 second is an eternity.)

Width-wise communication, if you mean that, can be quite a latency bottleneck for training. And it gets worse when you make the model wider or the batch bigger, which of course people are constantly doing. But for inference I guess you can reduce the latency if you're willing to use a small batch size.

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-11-05T11:55:32.339Z · LW · GW

Thanks for elaborating I think I know what you mean now. I missed this:

I am talking about pipelining loading the NN weights into the GPU. Which is not dependent on the result of the previous layer's computation.

My original claim was that Zero-infinity has higher latency compared to pipelining in across many layers of GPUs so that you don't have to repeatedly load weights from RAM. But as you pointed out, Zero-infinity may avoid the additional latency by loading the next layer's weights from RAM at the same as computing the previous layer's output. This helps IF loading the weights is at least as fast as computing the outputs. If this works, we may be able to deploy massive future neural nets on clusters no bigger than the ones we have today.

My original claim was therefore misconceived. I'll revise it to a different claim: bigger neural nets ought to have higher inference latency in general - regardless of the whether we use Zero-infinity or not. As I think we both agree, pipelining, in the sense of using different GPUs to compute different layers, doesn't reduce latency. However, adding more layers increases latency, and it's hard to compensate with other forms of parallelism. (Width-wise parallelism could help but its communication cost scales unfavorably. It grows as we grow the NN's width, and then again when we try to reduce latency by reducing the number of neurons per GPU [edit: it's not quadratic, I was thinking of the parameter count].) Does that seem right to you?

The consequence then would be that inference latency (if not inference cost) becomes a constraint as we grow NNs, at least for applications where latency matters.

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-11-03T22:48:54.718Z · LW · GW

The key is: pipelining doesn't help with latency of individual requests. But that's not what we care about here. What we care about is the latency from starting request 1 to finishing request N

Thanks for the examples. Your point seems to be about throughput, not latency (which to my knowledge is defined on a per-request basis). The latency per request may not matter for training but it does matter for inference if you want your model to be fast enough to interact with the world in real time or faster.

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-10-27T16:30:30.455Z · LW · GW

Perhaps what you meant is that latency will be high but this isn't a problem as long as you have high throughput. That's is basically true for training. But this post is about inference where latency matters a lot more.

(It depends on the application of course, but the ZeRO Infinity approach can make your model so slow that you don't want to interact with it in real time, even at GPT-3 scale)

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-10-27T16:25:10.702Z · LW · GW

That would be interesting if true. I thought that pipelining doesn't help with latency. Can you expand?

Generically, pipelining increases throughput without lowering latency. Say you want to compute f(x) where f is a NN. Every stage of your pipeline processes e.g. one of the NN layers. Then stage N has to wait for the earlier stages to be completed before it can compute the output of layer N. That's why the latency to compute f(x) is high.

NB, GPT-3 used pipelining for training (in combination with model- and data parallelism) and still the large GPT-3 has higher latency than the small ones in the OA API.

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-10-25T20:36:04.019Z · LW · GW

No, they don't. The primary justification for introducing them in the first place was to make a cheaper forward pass (=inference)

The motivation to make inference cheaper doesn't seem to be mentioned in the Switch Transformer paper nor in the original Shazeer paper. They do mention improving training cost, training time (from being much easier to parallelize), and peak accuracy. Whatever the true motivation may be, it doesn't seem that MoEs change the ratio of training to inference cost, except insofar as they're currently finicky to train.

But the glass is half-full: they also report that you can throw away 99% of the model, and still get a third of the boost over the baseline small model.

Only if you switch to a dense model, which again doesn't save you that much inference compute. But as you said, they should instead distill into an MoE with smaller experts. It's still unclear to me how much inference cost this could save, and at what loss of accuracy.

Either way, distilling would make it harder to further improve the model, so you lose one of the key benefits of silicon-based intelligence (the high serial speed which lets your model do a lot of 'thinking' in a short wallclock time).

Paul's estimate of TFLOPS cost vs API billing suggests that compute is not a major priority for them cost-wise

Fair, that seems like the most plausible explanation.

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-10-24T18:24:20.164Z · LW · GW

You may have better info, but I'm not sure I expect 1000x better serial speed than humans (at least not with innovations in the next decade). Latency is already a bottleneck in practice, despite efforts to reduce it. Width-wise parallelism has its limits and depth- or data-wise parallelism doesn't improve latency. For example, GPT-3 already has high latency compared to smaller models and it won't help if you make it 10^3x or 10^6x bigger.

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-10-24T18:24:04.900Z · LW · GW

As Steven noted, your $1/hour number is cheaper than my numbers and probably more realistic. That makes a significant difference.

I agree that transformative impact is possible once we've built enough GPUs and connected them up into many, many new supercomputers bigger than the ones we have today. In a <=10 year timeline scenario, this seems like a bottleneck. But maybe not with longer timelines.

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-10-24T17:53:47.853Z · LW · GW

you're missing all the possibilities of a 'merely human-level' AI. It can be parallelized, scaled up and down (both in instances and parameters), ultra-reliable, immortal, consistently improved by new training datasets, low-latency, ultimately amortizes to zero capital investment

I agree this post could benefit from discussing the advantages of silicon-based intelligence, thanks for bringing them up. I'd add that (scaled up versions of current) ML systems have disadvantages compared to humans, such as a lacking actuators and being cumbersome to fine-tune. Not to speak of the switching cost of moving from an economy based on humans to one based on ML systems. I'm not disputing that a human-level model could be transformative in years or decades -- I just argue that it may not be in the short-term.

Comment by SoerenMind on Inference cost limits the impact of ever larger models · 2021-10-24T17:52:03.755Z · LW · GW

I broadly agree with your first point, that inference can be made more efficient. Though we may have different views on how much?

Of course, both inference and training become more efficient and I'm not sure if the ratio between them is changing over time.

As I mentioned there are also reasons why inference could become more expensive than in the numbers I gave. Given this uncertainty, my median guess is that the cost of inference will continue to exceed the cost of training (averaged across the whole economy).

I don't think sparse (mixture of expert) models are an example of lowering inference cost. They mostly help with training. In fact they need so much more parameters that it's often worth distilling them into a dense model after training. The benefit of the sparse MoE architecture seems to be about faster, parallelizable training, not lower inference cost (same link).

Distillation seems to be the main source of cheaper inference then. How much does it help? I'm not sure in general but e.g. in the Switch Transformer paper (same link again), distilling into a 5x smaller model means losing most of the performance gained by using the larger model. Perhaps that's why as of May 2021, the OpenAI API does not seem to have a model that is nearly as good as the large GPT-3 but cheaper. (Unless the large GPT-3 is no longer available and has been replaced with something cheaper but equally good.)

(An additional source of cheaper inference is by the way low-precision hardware (https://dl.acm.org/doi/pdf/10.1145/3079856.3080246).)

Comment by SoerenMind on Emergent modularity and safety · 2021-10-21T12:45:10.771Z · LW · GW

Our default expectation about large neural networks should be that we will understand them in roughly the same ways that we understand biological brains, except where we have specific reasons to think otherwise.

Here's a relevant difference: In the brain, nearby neurons can communicate with lower cost and latency than far-apart neurons. This could encourage nearby neurons to form modules to reduce the number of connections needed in the brain. But this is not the case for standard artificial architectures where layers are often fully connected or similar.

Comment by SoerenMind on NLP Position Paper: When Combatting Hype, Proceed with Caution · 2021-10-16T11:40:54.384Z · LW · GW

Some minor feedback points: Just from reading the abstract and intro, this could be read as a non-sequitur: "It limits our ability to mitigate short-term harms from NLP deployments". Also, calling something a "short-term" problem doesn't seem necessary and it may sound like you think the problem is not very important.

Comment by SoerenMind on NLP Position Paper: When Combatting Hype, Proceed with Caution · 2021-10-16T11:26:39.421Z · LW · GW

The correct link is without the final dot: https://cims.nyu.edu/~sbowman/bowman2021hype.pdf

Comment by SoerenMind on Prefer the British Style of Quotation Mark Punctuation over the American · 2021-09-13T16:45:49.313Z · LW · GW

One thing I dislike about the 'punctuation outside quotes' view is that it treats "!" and "?" differently than a full stop.

"This is an exclamation"!
"Is this a question"?

Seems less natural to me than:

"This is an exclamation!"
"Is this a question?"

I think have this intuition because it is part of the quote that it is an exclamation or a question.

Comment by SoerenMind on What 2026 looks like · 2021-08-16T13:06:12.476Z · LW · GW

Yes I completely agree. My point is that the fine-tuned version didn't have better final coding performance than the version trained only on code. I also agree that fine-tuning will probably improve performance on the specific tasks we fine-tune on. 

Comment by SoerenMind on What 2026 looks like · 2021-08-11T16:10:39.702Z · LW · GW

Most importantly I expect them to be fine-tuned on various things (perhaps you can bundle this under "higher-quality data"). Think of how Codex and Copilot are much better than vanilla GPT-3 at coding. That's the power of fine-tuning / data quality.


Fine-tuning GPT-3 on code had little benefit compared to training from scratch:

Surprisingly, we did not observe improvements when starting from a pre-trained language model, possibly because the finetuning dataset is so large. Nevertheless, models fine-tuned from GPT converge more quickly, so we apply this strategy for all subsequent experiments.

I wouldn't categorize Codex under "benefits of fine-tuning/data quality" but under "benefits of specialization". That's because GPT-3 is trained on little code whereas Codex only on code.  (And the Codex paper didn't work on data quality more than the GPT-3 paper.)

 

Comment by SoerenMind on What 2026 looks like · 2021-08-11T10:11:34.057Z · LW · GW

2023

The multimodal transformers are now even bigger; the biggest are about half a trillion parameters [...] The hype is insane now


This part surprised me. Half a trillion is only 3x bigger than GPT-3. Do you expect this to make a big difference? (Perhaps in combination with better data?). I wouldn't, given that GPT-3 was >100x bigger than GPT-2. 

Maybe your'e expecting multimodality to help? It's possible, but worth keeping in mind that according to some rumors, Google's multimodal model already has on the order of 100B parameters.

On the other hand, I do expect more than half a trillion parameters by 2023 as this seems possible financially, and compatible with existing supercomputers and distributed training setups.

Comment by SoerenMind on Why not more small, intense research teams? · 2021-08-06T09:18:15.000Z · LW · GW

In my experience, this worked extremely well. But that was thanks to really good management and coordination which would've been hard in other groups I used to be part of.

Comment by SoerenMind on What made the UK COVID-19 case count drop? · 2021-08-04T15:57:39.118Z · LW · GW

This wouldn't explain the recent reduction in R because Delta has already been dominant for a while.

Comment by SoerenMind on What made the UK COVID-19 case count drop? · 2021-08-04T15:56:43.232Z · LW · GW

The  of Delta is ca. 2x the R0 of the Wuhan strain and this doubles the effect of new immunity on 

In fact, the ONS data gives me that ~7% of Scotland had Delta so that's a reduction in  of *7% = 6*7% = 0.42 just from very recent and sudden natural immunity. 

That's not [edited: forgot to say "not"] enough to explain everything, but there are more factors: 

1) Heterogenous immunity: the first people to become immune are often high-risk people who go to superspreader events etc. 

2) Vaccinations also went up. E.g. if 5% of Scotland got vaccinated in the relevant period, and that gives a 50% protection against being infected or infecting others (conditional on being infected), that's another reduction in  of ca. 6*0.05 =  0.18. 

3) Cases were rising and that usually leads to behavior changes like staying at home, cancelling events, and doing more LFD tests at home.

 

Comment by SoerenMind on How should my timelines influence my career choice? · 2021-08-04T15:38:05.497Z · LW · GW

Another heuristic is to choose the option where you're most likely to do exceptionally well. (Cf heavy tailed impact etc). Among other thing this, this pushes you to optimize for the timelines scenario where you can be very successful, and to do the job with the best personal fit.

Comment by SoerenMind on ($1000 bounty) How effective are marginal vaccine doses against the covid delta variant? · 2021-07-25T16:08:33.777Z · LW · GW

Age around 30 and not overweight or obviously unhealthy

Comment by SoerenMind on ($1000 bounty) How effective are marginal vaccine doses against the covid delta variant? · 2021-07-25T16:06:42.076Z · LW · GW

Some standard ones like masks, but not at all times. They probably were in close or indoor contact with infected people without precautions.

Comment by SoerenMind on ($1000 bounty) How effective are marginal vaccine doses against the covid delta variant? · 2021-07-22T18:40:31.974Z · LW · GW
  1. FWIW I've seen multiple double-mRNA-vaccinated people in my social circles who still got infected with delta (and in one case infected someone else who was double vaccinated). Two of the cases I know were symptomatic (but mild).
Comment by SoerenMind on ($1000 bounty) How effective are marginal vaccine doses against the covid delta variant? · 2021-07-22T18:36:01.011Z · LW · GW

According to one expert, the immune system essentially makes bets on how often it will face a given virus and how the virus will mutate in the future:

https://science.sciencemag.org/content/372/6549/1392

By that logic, being challenged more often means that the immune system should have a stronger and longer-lasting response:

The immune system treats any new exposure—be it infection or vaccination—with a cost-benefit threat analysis for the magnitude of immunological memory to generate and maintain. There are resource-commitment decisions: more cells and more protein throughout the body, potentially for decades. Although all of the calculus involved in these immunological cost-benefit analyses is not understood, a long-standing rule of thumb is that repeated exposures are recognized as an increased threat. Hence the success of vaccine regimens split into two or three immunizations.

The response becomes even stronger when challenging the immune system with different versions of the virus, in particular a vaccine and the virus itself (same link). 

Heightened response to repeated exposure is clearly at play in hybrid immunity, but it is not so simple, because the magnitude of the response to the second exposure (vaccination after infection) was much larger than after the second dose of vaccine in uninfected individuals. [...] Overall, hybrid immunity to SARS-CoV-2 appears to be impressively potent.

For SARS-CoV-2 this leads to a 25-100x stronger antibody response. It also comes with enhanced neutralizing breadth, and therefore likely some protection against future variants. 

Based on this, the article above recommends combining different vaccine modalities such as mRNA (Pfizer, Moderna) and vector (AZ) (see also here). 

Lastly, your question may be hard to answer without data, if we extrapolate from a similar question where the answer seems hard to predict in advance:

Additionally, the response to the second vaccine dose was minimal for previously infected persons, indicating an immunity plateau that is not simple to predict.

Comment by SoerenMind on Formal Inner Alignment, Prospectus · 2021-07-02T18:10:14.473Z · LW · GW

Suggestion for content 2: relationship to invariant causal prediction

Lots of people in ML these days seem excited about getting out of distribution generalization with techniques like invariant causal prediction. See e.g. this, this, section 5.2 here and related background. This literature seems promising but in discussions about inner alignment it's missing. It seems useful to discuss how far it can go in helping solve inner alignment. 

Comment by SoerenMind on Formal Inner Alignment, Prospectus · 2021-07-02T18:00:16.191Z · LW · GW

Suggestion for content 1: relationship to ordinary distribution shift problems

When I mention inner alignment to ML researchers, they often think of it as an ordinary problem of (covariate) distribution shift.

My suggestion is to discuss if a solution to ordinary distribution shift is also a solution to inner alignment. E.g. an 'ordinary' robustness problem for imitation learning could be handled safely with an approach similar to Michael's: maintain a posterior over hypotheses , with a sufficiently flexible hypothesis class, and ask for help whenever the model is uncertain about the output y for a new input x.

One interesting subtopic is whether inner alignment is an extra-ordinary robustness problem because it is adversarial: even the tiniest difference between train and test inputs might cause the model to misbehave. (See also this.)