Posts

An illustrative model of backfire risks from pausing AI research 2023-11-06T14:30:58.615Z
Expectations for Gemini: hopefully not a big deal 2023-10-02T15:38:32.834Z
How should OpenAI communicate about the commercial performances of the GPT-3 API? 2020-11-24T08:34:08.988Z
Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe? 2020-08-09T17:17:24.093Z

Comments

Comment by Maxime Riché (maxime-riche) on Dangers of Closed-Loop AI · 2024-03-23T23:49:35.451Z · LW · GW

Are memoryless LLMs with a limited context window, significantly open loop? (Can't use summarization between calls nor get access to previous prompts)

Comment by Maxime Riché (maxime-riche) on We need a Science of Evals · 2024-01-23T11:57:08.528Z · LW · GW

FYI, the "Evaluating Alignment Evaluations" project of the current AI Safety Camp is working on studying and characterizing alignment(propensity) evaluations. We hope to contribute to the science of evals, and we will contact you next month. (Somewhat deprecated project proposal)

Comment by Maxime Riché (maxime-riche) on An illustrative model of backfire risks from pausing AI research · 2023-12-04T03:28:19.766Z · LW · GW

Interesting! I will see if I can correct that easily.

Comment by Maxime Riché (maxime-riche) on AI Timelines · 2023-11-10T14:57:58.581Z · LW · GW

Thanks a lot for the summary at the start!

Comment by Maxime Riché (maxime-riche) on AI Alignment Breakthroughs this week (10/08/23) · 2023-10-09T09:31:35.113Z · LW · GW

I wonder if the result is dependent on the type of OOD.

If you are OOD by having less extractable information, then the results are intuitive. 
If you are OOD by having extreme extractable information or misleading information, then the results are unexpected.

Oh, I just read their Appendix A: "Instances Where “Reversion to the OCS” Does Not Hold"
Outputting the average prediction is indeed not the only behavior OOD. It seems that there are different types of OOD regimes.

Comment by Maxime Riché (maxime-riche) on Expectations for Gemini: hopefully not a big deal · 2023-10-02T16:35:36.800Z · LW · GW

This comes from OpenAI saying they didn't expect ChatGPT to be a big commercial success. It was not a top-priority project. 

Comment by Maxime Riché (maxime-riche) on Report on Frontier Model Training · 2023-08-31T20:33:08.836Z · LW · GW

In fact, the costs to inference ChatGPT exceed the training costs on a weekly basis

That seems quite wild, if the training cost was 50M$, then the inference cost for a year would be 2.5B$.

The inference cost dominating the cost seems to depend on how you split the cost of building the supercomputer (buying the GPUs).
If you include the cost of building the supercomputer into the training cost, then the inference cost (without the cost of building the computer) looks cheap. If you split the building cost between training and inference in proportion to the "use time", then the inference cost would dominate.

Comment by Maxime Riché (maxime-riche) on Report on Frontier Model Training · 2023-08-31T08:54:39.705Z · LW · GW

Are these 2 bullet points faithful to your conclusion?

  • GPT-4 training run (renting the compute for the final run): 100M$, of which 1/3 to 2/3 is the cost of the staff
  • GPT-4 training run + building the supercomputer: 600M$, of which ~20% for cost of the staff

And some hot takes (mine):

  • Because supercomputers become "obsolete" quickly (~3 years), you need to run inferences to pay for building your supercomputer (you need profitable commercial applications), or your training cost must also account for the full cost of the supercomputer, and this produces a ~x6 increase in training cost.
  • In forecasting models, we may be underestimating the investment to be able to train a frontier model by ~x6 (closer to 600M$ in 2022 than 100M$).
  • The bottleneck to train new frontier models is now going to be building more powerful supercomputers. 
  • More investments won't help that much in solving this bottleneck. 
  • This bottleneck will cause most capability gains to come from improving software efficiency.
  • Open-source models will stay close in terms of capability to frontier models.
  • This will reduce the profitability of simple and general commercial applications. 
Comment by Maxime Riché (maxime-riche) on What a compute-centric framework says about AI takeoff speeds · 2023-08-28T13:39:20.901Z · LW · GW

1) In the web interface, the parameter "Hardware adoption delay" is:

Meaning: Years between a chip design and its commercial release.

Best guess value: 1

Justification for best guess value: Discussed here. The conservative value of 2.5 years corresponds to an estimate of the time needed to make a new fab. The aggressive value (no delay) corresponds to fabless improvements in chip design that can be printed with existing production lines with ~no delay.

Is there another parameter for the delay (after the commercial release) to produce the hundreds of thousands of chips and build a supercomputer using them? 
(With maybe an aggressive value for just "refurnishing" an existing supercomputer or finishing a supercomputer just waiting for the chips)

 

2) Do you think that in a scenario with quick large gains in hardware efficiency, the delay for building a new chip fab could be significantly larger than the current estimate because of the need to also build new factories for the machines that will be used in the new chip fab? (e.g. ASMI could also need to build factories, not just TSMC)

 

3) Do you think that these parameters/adjustments would significantly change the relative impact on the takeoff of the "hardware overhang" when compared to the "software overhang"? (e.g. maybe making hardware overhang even less important for the speed of the takeoff)

Comment by Maxime Riché (maxime-riche) on Large language models aren't trained enough · 2023-03-29T07:59:12.758Z · LW · GW

This is a big reason for why GPT4 is likely not that big but instead trained on much more data :)

Comment by Maxime Riché (maxime-riche) on Database of existential risk estimates · 2023-03-22T22:09:15.587Z · LW · GW

Do you also have estimates of the fraction of resources in our light cone that we expect to be used to create optimised good stuff?

Comment by Maxime Riché (maxime-riche) on The Waluigi Effect (mega-post) · 2023-03-06T15:14:46.574Z · LW · GW

Maybe the use of prompt suffixes can do a great deal to decrease the probability chatbots turning into Waluigi. See the "insert" functionality of OpenAI API https://openai.com/blog/gpt-3-edit-insert
Chatbots developers could use suffix prompts in addition to prefix prompts to make it less likely to fall into a Waluigi completion. 
 

Comment by Maxime Riché (maxime-riche) on The Waluigi Effect (mega-post) · 2023-03-06T13:42:37.429Z · LW · GW

Indeed, empirical results show that filtering the data, helps quite well in aligning with some preferences: Pretraining Language Models with Human Preferences

Comment by Maxime Riché (maxime-riche) on Gradient hacking is extremely difficult · 2023-01-24T17:59:00.136Z · LW · GW

What about the impact of dropout (parameters, layers), normalisation (batch, layer) (with a batch containing several episodes), asynchronous distributed data collection (making batch aggregation more stochastic), weight decay (impacting any weight), multi-agent RL training with independent agents, etc.
And other possible stuff that don't exist at the moment: online pruning and growth while training, population training where the gradient hackers are exploited.

Shouldn't that naively make gradient hacking very hard?

Comment by Maxime Riché (maxime-riche) on Human sexuality as an interesting case study of alignment · 2022-12-31T15:24:37.321Z · LW · GW

We see a lot of people die, in the reality, fictions and dreams.

We also see a lot of people having sex or sexual desire in fictions or dreams before experiencing it.

IDK how strong this is a counter argument to how powerful the alignment in us is. Maybe a biological reward system + imitation+ fiction and later dreams is simply what is at play in humans.

Comment by Maxime Riché (maxime-riche) on The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable · 2022-11-29T13:30:53.533Z · LW · GW

Should we expect these decompositions to be even more interpretable if the model was trained to output a prediction as soon as possible? (After any block, instead of outputting the prediction after the full network)

Comment by Maxime Riché (maxime-riche) on The shard theory of human values · 2022-09-29T12:29:01.408Z · LW · GW

Some quick thoughts about "Content we aren’t (yet) discussing":

 

Shard theory should be about transmission of values by SL (teaching, cloning, inheritance) more than learning them using RL

SL (Cloning) is more important than RL. Humans learn a world model by SSL, then they bootstrap their policies through behavioural cloning and finally they finetune their policies thought RL.

Why? Because of theoretical reasons and from experimental data points, this is the cheapest why to generate good general policies…

  • SSL before SL because you get much more frequent and much denser data about the world by trying to predict it. => SSL before SL because of a bottleneck on the data from SL.
  • SL before RL because this remove half (in log scale) of the search space by removing the need to discover|learn your reward function at the same time than your policy function. Because in addition, this remove the need do to the very expensive exploration and the temporal and "agential"(when multiagent) credit assignments. => SL before RL because of the cost of doing RL.

Differences:

  • In cloning, the behaviour comes first and then the biological reward is observed or not. Behaviours that gives no biological reward to the subject can be learned. The subject will still learn some kind of values associated to these behaviours.
  • Learning with SL, instead of RL, doesn’t rely as much on credit assignment and exploration. What are the consequences of that?

What values are transmitted?

1) The final values

The learned values known by the previous generation.

Why?

  • Because it is costly to explore by yourself your reward function space
  • Because it is beneficiary to the community to help you improve your policies quickly

2) Internalised instrumental values

Some instrument goals are learned as final goal, they are “internalised”.

Why?

  • exploration is too costly
    • finding an instrumental goal is too rare or too costly
  • exploitation is too costly
    • having to make the choice of pursuing an instrumental goal in every situation is too costly or not quick enough (reaction time)
  • when being highly credible is beneficial
    • implicit commitments to increase your credibility

3) Non-internalised instrumental values

Why?

  • Because it is beneficiary to the community to help you improve your policies quickly

 

 

Shard theory is not about the 3rd level of reward function

We have here 3 level of rewards function:

1) The biological rewards

Hardcoded in our body

Optimisation process creating it: Evolution

  • Universe + Evolution ⇒ Biological rewards

Not really flexible

  • Without “drugs” and advanced biotechnologies

Almost no generalization power

  • Physical scope: We feel stuff when we are directly involved
  • Temporal scope: We feel stuff when they are happening
  • Similarity scope: We fell stuff when we are directly involved

Called sensations, pleasure, pain

2) The learned values | rewards | shards

Learned through life

Optimisation process creating it: SL and RL relying on biological rewards

  • Biological rewards + SL and RL ⇒ Learned values in the brain

Flexible in term of years

Medium generalization power

  • Physical scope: We learn to care for even in case where we are not involved (our close circle)
  • Temporal scope: We learn to feel emotions about the future and the past
  • Similarity scope: We learn to feel emotions for other kind of beings

Called intuitions, feelings

Shard theory may be explaining only this part

3) (optional) The chosen values:

Decided upon reflection

Optimisation process creating it: Thinking relying on the brain

  • Learned values in the brain + Thinking ⇒ Chosen values “on paper” | “in ideas”

Flexible in term of minutes

Can have up to very high generalization power

  • Physical scope: We can chose to care without limits of distances in space
  • Temporal scope: We can chose to care without limits of distances in time
  • Similarity scope: We can chose to care without limits in term of similarity to us

Called values, moral values

 

Why a 3rd level was created?

In short, to get more utility OOD.

A bit more details:

Because we want to design policies far OOD (out of our space of lived experiences). To do that, we know that we need to have a value function|reward model|utility function that generalizes very far. Thanks to this chosen general reward function, we can plan and try to reach a desired outcome far OOD. After reaching it, we will update our learned utility function (lvl 2).

Thanks to lvl 3, we can design public policies, dedicate our life to exploring the path towards a larger reward that will never be observed in our lifetime.

 

One impact of the 3 levels hierarchy:

This could explain why most philosophers can support scope sensitive values but never act on them.

Comment by Maxime Riché (maxime-riche) on Let's See You Write That Corrigibility Tag · 2022-06-21T08:49:33.266Z · LW · GW

You can see the sum of the votes and the number of votes (by having your mouse over the number). This should be enough to give you a rough idea of the ration between + and - votes :) 

Comment by Maxime Riché (maxime-riche) on Is AI Progress Impossible To Predict? · 2022-05-16T09:11:40.923Z · LW · GW

If you look at the logit given a range that is not [0.0, 1.0] but [low perf, high perf], then you get a bit more predictive power, but it is still confusingly low.

A possible intuition here is that the scaling is producing a transition from non-zero performance to non-perfect performance. This seems right since the random baseline is not 0.0 and reaching perfect accuracy is impossible. 

I tried this only with PaLM on NLU and I used the same adjusted range for all tasks:

[0.9 * overall min. acc., 1.0 - 0.9 * (1.0 - overall max acc.)] ~ [0.13, 0.95]

Even if this model was true, they are maybe other additional explanations like the improvement on one task are not modeled by one logit function but by several of them. A task would be composed of sub-tasks each modelizable by one logit function. And if this make sense, one could try to model the improvements in all of the tasks using only a small number of logit curves associated to each sub-tasks (decomposing each tasks into a set of sub-tasks with a simple trend).

(Also Gopher looks like less predictable and the data more sparse (no data point in the X0 B parameters))

Comment by Maxime Riché (maxime-riche) on "A Generalist Agent": New DeepMind Publication · 2022-05-13T08:39:38.681Z · LW · GW

Indeed but to slightly counter balance this, at the same time, it looks like it was trained on ~500B tokens (while ~300B were used for GPT-3 and for GPT-2 something like ~50B).

Comment by Maxime Riché (maxime-riche) on Deepmind's Gato: Generalist Agent · 2022-05-13T08:30:10.610Z · LW · GW

It's only 1.2 billion parameters.

 

Indeed but to slightly counter balance this, at the same time, it looks like it was trained on ~500B tokens (while ~300B were used for GPT-3 and something like ~50B for GPT-2).

Comment by Maxime Riché (maxime-riche) on Google's new 540 billion parameter language model · 2022-04-06T08:05:40.638Z · LW · GW

"The training algorithm has found a better representation"?? That seems strange to me since the loss should be lower in that case, not spiking.  Or maybe you mean that the training broke free of a kind of local minima (without telling that he found a better one yet). Also I guess people training the models observed that waiting after these spike don't lead to better performances or they would not have removed them from the training. 

Around this idea, and after looking at the "grokking" paper, I would guess that it's more likely caused by the weight decay (or similar) causing the training to break out of a kind of local minima.  An interesting point may be that larger/better LM may have significantly sharper internal models and thus are more prone to this phenomenon (The weight decay (or similar) more easily breaking the more sensitive/better/sharper models).

It should be very easy to check if these spikes are caused by the weight decay "damaging" very sharp internal models. Like replay the spiky part several times with less and less weight decay... (I am curious of similar tests with varying the momentum, dropout... At looking if the spikes are initially triggered by some subset of the network, during how many training steps long are the spikes...)     

Comment by Maxime Riché (maxime-riche) on Google's new 540 billion parameter language model · 2022-04-05T14:51:36.040Z · LW · GW

I am curious to hear/read more about the issue of spikes and instabilities in training large language model (see the quote / page 11 of the paper). If someone knows a good reference about that, I am interested!

5.1 Training Instability

For the largest model, we observed spikes in the loss roughly 20 times during training, despite the fact that gradient clipping was enabled. These spikes occurred at highly irregular intervals, sometimes happening late into training, and were not observed when training the smaller models. Due to the cost of training the largest model, we were not able to determine a principled strategy to mitigate these spikes.

Instead, we found that a simple strategy to effectively mitigate the issue: We re-started training from a checkpoint roughly 100 steps before the spike started, and skipped roughly 200–500 data batches, which cover the batches that were seen before and during the spike. With this mitigation, the loss did not spike again at the same point. We do not believe that the spikes were caused by “bad data” per se, because we ran several ablation experiments where we took the batches of data that were surrounding the spike, and then trained on those same data batches starting from a different, earlier checkpoint. In these cases, we did not see a spike. This implies that spikes only occur due to the combination of specific data batches with a particular model parameter state. In the future, we plan to study more principled mitigation strategy for loss spikes in very large language models.

Comment by Maxime Riché (maxime-riche) on D𝜋's Spiking Network · 2022-01-05T10:35:31.664Z · LW · GW

Here with 2 conv and less than 100k parameters the accuracy is ~92%. https://github.com/zalandoresearch/fashion-mnist

SOTA on Fashion-MNIST is >96%. https://paperswithcode.com/sota/image-classification-on-fashion-mnist

Comment by Maxime Riché (maxime-riche) on Understanding Gradient Hacking · 2021-12-20T14:47:09.392Z · LW · GW

Maybe another weak solution close to "Take bigger steps": Use decentralize training. 

Meaning: perform several training steps (gradient updates) in parallel on several replicates of the model and periodically synchronize the weights (like average them).  

Each replicate has only access to its own inputs and local weights and thus it seems plausible that the gradient hacker can't as easily cancel gradients going against its mesa-objective.

Comment by Maxime Riché (maxime-riche) on Safe exploration and corrigibility · 2020-11-16T16:20:51.658Z · LW · GW

One particularly interesting recent work in this domain was Leike et al.'s “Learning human objectives by evaluating hypothetical behaviours,” which used human feedback on hypothetical trajectories to learn how to avoid environmental traps. In the context of the capability exploration/objective exploration dichotomy, I think a lot of this work can be viewed as putting a damper on instrumental capability exploration.

Isn't this work also linked to objective exploration? One of the four "hypothetical behaviours" used is the selection of trajectories which maximizes reward uncertainty. Trajectories which are then evaluated by humans. 

Comment by Maxime Riché (maxime-riche) on Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe? · 2020-08-09T14:02:09.410Z · LW · GW
Is your suggestion to run this system as a source of value, simulating lives for their own sake rather than to improve the quality of life of sentient beings in our universe? Our history (and present) aren't exactly utopian, and I don't see any real reason to believe that slight variations on it would lead to anything happier.

I am thinking about if we should reasonably expect to produce better result by trying to align an AGI with our value than by simulating a lot of alternate universes. I am not saying that this is net-negative or net-positive. It seems to me that the expected value of both cases may be identical.


Also by history, I also meant the future, not only the past and present. (I edited the question to replace "histories" by "trajectories")

Comment by Maxime Riché (maxime-riche) on Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe? · 2020-08-09T13:09:15.529Z · LW · GW

(About the first part of your comment) Thank you for pointing to three confused points:

First, I don't know if you intended this, but "stimulating the universe" carries a connotation of a low-level physics simulation. This is computationally impossible. Let's have it model the universe instead, using the same kind of high-level pattern recognition that people use to predict the future.

To be more precise, what I had in mind is that the ASI is an agent which goal is:

  • to model the sentient part of the universe finely enough to produce sentience in an instance of its model (and it will also need to model the necessary non-sentient "dependencies")
  • and to instantiate this model N times. For example, playing them from 1000 A.D. to the time where no sentience remains in a given instance of modeled universe. (all of this efficiently)

(To reduce complexity, I didn't mention it but we could think of heuristics to reduce playing to much of the "past" and "future" history filled suffering)

Second, if the AGI is simulating itself, the predictions are wildly undetermined; it can predict that it will do X, and then fulfill its own prophecy by actually doing X, for any X. Let's have it model a counterfactual world with no AGIs in it.

An instance of the modeled universe would not be our present universe. It would be "another seed", starting before that the ASI exists and thus it would not need to model itself but only possible ("new") ASI produced inside the instances.

Third, you need some kind of interface. Maybe you type in "I'm interested in future scenarios in which somebody cures Alzheimer's and writes a scientific article describing what they did. What is the text of that article?" and then it runs through a bunch of scenarios and prints out its best-guess article in the first 50 scenarios it can find. (Maybe also print out a retrospective article from 20 years later about the long-term repercussions of the invention.) For a different type of interface, see microscope AI.

In the scenario I had in mind, the ASI would fill our universe will computing machines to produce as many instances as possible. (We would not use it and thus we will not need interface with the ASI)

Comment by Maxime Riché (maxime-riche) on Covid 7/9: Lies, Damn Lies and Death Rates · 2020-07-09T14:37:09.769Z · LW · GW
Explanations 1+5: We are doing a better job treating people who get infected.
Explanations 2+3+6: Different people are getting infected who are less vulnerable.
Explanation 4: We are increasingly covering up deaths.

I did not read everything... but between the 1st and 2 wage, there is ~ x5 time less death but ~ x2 more daily cases currently. Could this be also explained by much more tests being done?

Then the first wage would have been ~x10 time higher than reported in the comparaison and the second wage would currently be still below the first.

Comment by Maxime Riché (maxime-riche) on Open & Welcome Thread - February 2020 · 2020-02-21T16:30:01.308Z · LW · GW

Offering 100-300h of technical work on an AI Safety project

I am a deep learning engineer (2y exp), I currently develop vision models to be used on satellite images (I also do some software engineering around that) (Linkedin profile https://www.linkedin.com/in/maxime-riche-73696182/). On my spare time, I am organizing a EA local group in Toulouse (France), learning RL, doing a research project on RL for computer vision (only expecting indirect utility from this) and developing an EAA tool (EffectiveAnimalAdvocacy). I have been in the French EA community for 4 years. In 2020, I chose to work part time to dedicate 2 to 3 days of work per week to EA aligned projects.Thus for the next 8 months, I have ~10h / week that I want to dedicate to assist an AI safety project. For myself, I am not looking for funds, nor to publish myself a paper, nor a blog post.To me the ideal project would be:

  • a relevant technical AI safety project (research or not). I am looking for advice on the "relevant" part.
  • where I would be able to help the project to achieve better quality results than otherwise without my contribution. (e.g. through writing better code, doing more experiments, testing other designs)
  • where I can learn more about technical AI safety
  • where my contribution would include writing code. If it is a research proposal, then implement experiments. If there is no experimental part currently in the project, I could take charge of creating one.