Posts

Alignment: "Do what I would have wanted you to do" 2024-07-12T16:47:24.090Z
Fix simple mistakes in ARC-AGI, etc. 2024-07-09T17:46:50.364Z
I'm a bit skeptical of AlphaFold 3 2024-06-25T00:04:41.274Z

Comments

Comment by Oleg Trott (oleg-trott) on Fix simple mistakes in ARC-AGI, etc. · 2024-07-11T13:33:18.896Z · LW · GW

A variation on this:

Any expression should be considered for replacement by a slightly bigger or smaller one. For example

z = f(x**2 * y)

should be replaceable by

z = f((x**2 - 1) * y)

The generated programs are quite short. So I would guess that this multiplies their number by 100-1000, if you consider one perturbation at a time.

Comment by Oleg Trott (oleg-trott) on Fix simple mistakes in ARC-AGI, etc. · 2024-07-11T02:08:44.932Z · LW · GW

If GPT-4o made the off-by-one error, is it reasonable to expect GPT-3.5 to spot it?

Comment by Oleg Trott (oleg-trott) on Fix simple mistakes in ARC-AGI, etc. · 2024-07-10T21:04:38.573Z · LW · GW

@ryan_greenblatt's approach also asks GPT-4o to improve its previous guesses.

These calls are expensive though.

The idea of Program Dithering is to generate many candidate programs cheaply.

Comment by Oleg Trott (oleg-trott) on Fix simple mistakes in ARC-AGI, etc. · 2024-07-10T01:07:55.892Z · LW · GW

If you have  locations that you want to perturb, then if you try a single off-by-one perturbation at a time, this adds  programs. With two at a time, this adds  programs.

There's a possible optimization, where you only try this on tasks where no unperturbed program was found (<28%)

 

EDIT: Ironically, I made an off-by-one error, which Program Dithering would have fixed: This should be 

Comment by Oleg Trott (oleg-trott) on How good are LLMs at doing ML on an unknown dataset? · 2024-07-03T16:29:16.631Z · LW · GW

This looks similar, in spirit, to Large Language Models as General Pattern Machines:

https://arxiv.org/abs/2307.04721 

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-26T22:38:59.370Z · LW · GW

I'm surprised by how knowledgeable people are about this on this site!

BTW, there's some discussion of this happening on the CCL mailing list (limited to professionals in relevant fields) if you are interested.

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-26T19:16:20.588Z · LW · GW

Right. The benchmark (their test set) just compares 3D structures.

Side note: 52% also seems low for Vina, but I haven't looked into this further. Maybe the benchmark is hard, or maybe the "search space" (user-specified) was too big.

On their other test (in their Extended Data), both Vina and AF3 do much better. 

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-26T16:40:04.278Z · LW · GW

Unlike Vina, AF3 only predicts 3D structures, I believe. It does not predict binding affinities.

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-25T17:33:47.738Z · LW · GW

Determining 3D structures is expensive.

The most realistic thing one could do is repeat this work, with the same settings, but using k-fold cross-validation, where test and training sets are never related (like what I did at Columbia). 

This will show how well (or poorly, as the case may be) the method generalizes to unrelated proteins.

I hope someone does it.

Comment by Oleg Trott (oleg-trott) on I'm a bit skeptical of AlphaFold 3 · 2024-06-25T08:35:17.217Z · LW · GW

(ligand = drug-like molecule, for anyone else reading)

Right, I didn't mean exact bitwise memory comparisons.

The dataset is redundant(ish), simply as an artifact of how it's constructed:

For example, if people know that X binds A, and X ≈ Y, and A ≈ B, they'll try to add X+B, Y+A and Y+B to the dataset also.

And this makes similarity-based predictions look artificially much more useful than they actually are, because in the "real world", you will need to make predictions about dissimilar molecules from some collection.

I hope this makes sense.