## Posts

Alignment: "Do what I would have wanted you to do" 2024-07-12T16:47:24.090Z
Fix simple mistakes in ARC-AGI, etc. 2024-07-09T17:46:50.364Z
I'm a bit skeptical of AlphaFold 3 2024-06-25T00:04:41.274Z

Comment by Oleg Trott (oleg-trott) on Fix simple mistakes in ARC-AGI, etc. · 2024-07-11T13:33:18.896Z · LW · GW

A variation on this:

Any expression should be considered for replacement by a slightly bigger or smaller one. For example

z = f(x**2 * y)

should be replaceable by

z = f((x**2 - 1) * y)

The generated programs are quite short. So I would guess that this multiplies their number by 100-1000, if you consider one perturbation at a time.

Comment by Oleg Trott (oleg-trott) on Fix simple mistakes in ARC-AGI, etc. · 2024-07-11T02:08:44.932Z · LW · GW

If GPT-4o made the off-by-one error, is it reasonable to expect GPT-3.5 to spot it?

Comment by Oleg Trott (oleg-trott) on Fix simple mistakes in ARC-AGI, etc. · 2024-07-10T21:04:38.573Z · LW · GW

@ryan_greenblatt's approach also asks GPT-4o to improve its previous guesses.

These calls are expensive though.

The idea of Program Dithering is to generate many candidate programs cheaply.

Comment by Oleg Trott (oleg-trott) on Fix simple mistakes in ARC-AGI, etc. · 2024-07-10T01:07:55.892Z · LW · GW

If you have  locations that you want to perturb, then if you try a single off-by-one perturbation at a time, this adds  programs. With two at a time, this adds  programs.

There's a possible optimization, where you only try this on tasks where no unperturbed program was found (<28%)

EDIT: Ironically, I made an off-by-one error, which Program Dithering would have fixed: This should be

Comment by Oleg Trott (oleg-trott) on How good are LLMs at doing ML on an unknown dataset? · 2024-07-03T16:29:16.631Z · LW · GW

This looks similar, in spirit, to Large Language Models as General Pattern Machines:

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-26T22:38:59.370Z · LW · GW

BTW, there's some discussion of this happening on the CCL mailing list (limited to professionals in relevant fields) if you are interested.

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-26T19:16:20.588Z · LW · GW

Right. The benchmark (their test set) just compares 3D structures.

Side note: 52% also seems low for Vina, but I haven't looked into this further. Maybe the benchmark is hard, or maybe the "search space" (user-specified) was too big.

On their other test (in their Extended Data), both Vina and AF3 do much better.

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-26T16:40:04.278Z · LW · GW

Unlike Vina, AF3 only predicts 3D structures, I believe. It does not predict binding affinities.

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-25T17:33:47.738Z · LW · GW

Determining 3D structures is expensive.

The most realistic thing one could do is repeat this work, with the same settings, but using k-fold cross-validation, where test and training sets are never related (like what I did at Columbia).

This will show how well (or poorly, as the case may be) the method generalizes to unrelated proteins.

I hope someone does it.

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-25T08:35:17.217Z · LW · GW

(ligand = drug-like molecule, for anyone else reading)

Right, I didn't mean exact bitwise memory comparisons.

The dataset is redundant(ish), simply as an artifact of how it's constructed:

For example, if people know that X binds A, and X ≈ Y, and A ≈ B, they'll try to add X+B, Y+A and Y+B to the dataset also.

And this makes similarity-based predictions look artificially much more useful than they actually are, because in the "real world", you will need to make predictions about dissimilar molecules from some collection.

I hope this makes sense.