Posts

Ethan Caballero on Broken Neural Scaling Laws, Deception, and Recursive Self Improvement 2022-11-04T18:09:04.759Z
Ethan Caballero's Shortform 2022-09-11T19:52:49.652Z

Comments

Comment by Ethan Caballero (ethan-caballero) on We may be able to see sharp left turns coming · 2023-06-10T21:50:40.344Z · LW · GW

Read Section 6 titled “The Limit of the Predictability of Scaling Behavior” in this paper: 
https://arxiv.org/abs/2210.14891

Comment by Ethan Caballero (ethan-caballero) on PaLM-2 & GPT-4 in "Extrapolating GPT-N performance" · 2023-06-05T05:58:06.374Z · LW · GW

We describe how to go about fitting a BNSL to yield best extrapolation in the last paragraph of Appendix Section A.6 "Experimental details of fitting BNSL and determining the number of breaks" of the paper: 
https://arxiv.org/pdf/2210.14891.pdf#page=13

Comment by Ethan Caballero (ethan-caballero) on PaLM-2 & GPT-4 in "Extrapolating GPT-N performance" · 2023-05-30T23:12:35.948Z · LW · GW

Sigmoids don't accurately extrapolate the scaling behavior(s) of the performance of artificial neural networks. 

Use a Broken Neural Scaling Law (BNSL) in order to obtain accurate extrapolations: 
https://arxiv.org/abs/2210.14891
https://arxiv.org/pdf/2210.14891.pdf
 

Comment by Ethan Caballero (ethan-caballero) on GPT-4 · 2023-03-15T06:22:47.632Z · LW · GW

Did ARC try making a scaling plot with training compute on the x-axis and autonomous replication on the y-axis?

Comment by Ethan Caballero (ethan-caballero) on AI Safety in a World of Vulnerable Machine Learning Systems · 2023-03-09T09:40:14.155Z · LW · GW

The setting was adversarial training and adversarial evaluation. During training, PGD attacker of 30 iterations is used to construct adversarial examples used for training. During testing, the evaluation test set is an adversarial test set that is constructed via PGD attacker of 20 iterations.

Experimental data of y-axis is obtained from Table 7 of https://arxiv.org/abs/1906.03787; experimental data of x-axis is obtained from Figure 7 of https://arxiv.org/abs/1906.03787.

Comment by Ethan Caballero (ethan-caballero) on AI Safety in a World of Vulnerable Machine Learning Systems · 2023-03-08T16:53:46.053Z · LW · GW

"However, to the best of our knowledge there are no quantitative scaling laws for robustness yet."


For scaling laws for adversarial robustness, see appendix A.15 of openreview.net/pdf?id=sckjveqlCZ#page=22

Comment by Ethan Caballero (ethan-caballero) on AI Safety in a World of Vulnerable Machine Learning Systems · 2023-03-08T16:52:25.891Z · LW · GW
Comment by Ethan Caballero (ethan-caballero) on Ethan Caballero on Private Scaling Progress · 2023-02-04T22:09:34.834Z · LW · GW

arxiv.org/abs/2210.14891

Comment by Ethan Caballero (ethan-caballero) on Parameter Scaling Comes for RL, Maybe · 2023-01-24T22:49:01.012Z · LW · GW

See section 5.3 "Reinforcement Learning" of https://arxiv.org/abs/2210.14891 for more RL scaling laws with number of model parameters on the x-axis (and also RL scaling laws with the amount of compute used for training on the x-axis and RL scaling laws with training dataset size on the x-axis).
 

Comment by Ethan Caballero (ethan-caballero) on Whisper's Wild Implications · 2023-01-04T02:12:24.300Z · LW · GW

re: youtube estimates

You'll probably find some of this twitter discussion useful:
https://twitter.com/HenriLemoine13/status/1572846452895875073

Comment by Ethan Caballero (ethan-caballero) on Evidence on recursive self-improvement from current ML · 2023-01-02T18:52:15.423Z · LW · GW

OP will find this paper useful:
https://arxiv.org/abs/2210.14891

Comment by Ethan Caballero (ethan-caballero) on How is the "sharp left turn defined"? · 2022-12-09T02:38:47.493Z · LW · GW

I give a crisp definition from 6:27 to 7:50 of this video: 

Comment by Ethan Caballero (ethan-caballero) on AI Forecasting Research Ideas · 2022-11-18T13:33:45.496Z · LW · GW

> Re: "Extrapolating GPT-N performance" and "Revisiting ‘Is AI Progress Impossible To Predict?’" sections of google doc


Read Section 6 titled "The Limit of the Predictability of Scaling Behavior" of "Broken Neural Scaling Laws" paper: 
https://arxiv.org/abs/2210.14891

Comment by Ethan Caballero (ethan-caballero) on Current themes in mechanistic interpretability research · 2022-11-17T06:57:49.795Z · LW · GW

One other goal / theme of mechanistic interpretability research imo: 
twitter.com/norabelrose/status/1588571609128108033

Comment by Ethan Caballero (ethan-caballero) on Ethan Caballero on Broken Neural Scaling Laws, Deception, and Recursive Self Improvement · 2022-11-06T20:20:01.449Z · LW · GW

When f (in equation 1 of the paper ( https://arxiv.org/abs/2210.14891 ) not the video) of next break is sufficiently large, it gives you predictive ability to determine when that next break will occur; also, the number of seeds needed to get such predictive ability is very large. When f of next break is sufficiently small (& nonnegative), it does not give you predictive ability to determine when that next break will occur.

Play around with  in this code to see what I mean: 
https://github.com/ethancaballero/broken_neural_scaling_laws/blob/main/make_figure_1__decomposition_of_bnsl_into_power_law_segments.py#L25-L29

Comment by Ethan Caballero (ethan-caballero) on Ethan Caballero on Broken Neural Scaling Laws, Deception, and Recursive Self Improvement · 2022-11-04T21:20:54.658Z · LW · GW

https://discord.com/channels/729741769192767510/785968841301426216/958570285760647230

Comment by Ethan Caballero (ethan-caballero) on Path dependence in ML inductive biases · 2022-09-11T03:37:35.122Z · LW · GW

Sections 3.1 and 6.6 titled "Ossification" of "Scaling Laws for Transfer" paper (https://arxiv.org/abs/2102.01293) show that current training of current DNNs exhibits high path dependence.