AlphaFold 2 paper released: "Highly accurate protein structure prediction with AlphaFold", Jumper et al 2021

post by gwern · 2021-07-15T19:27:20.584Z · LW · GW · 10 comments

This is a link post for https://www.nature.com/articles/s41586-021-03819-2

10 comments

Comments sorted by top scores.

comment by [deleted] · 2021-07-24T15:10:22.856Z · LW(p) · GW(p)

Personally, I can confirm that every yeast protein I work with that does not have a structure, when fed through alphafold produces absolute garbage with mean predicted errors on the order of ten or twenty angstroms and obvious nonsense in the structure.  

Granted I work with a lot of repetitive poorly structured proteins which, in as model-system of an organism as yeast, are the only ones without structures and someone has to get unlucky... but still.

Replies from: Charlie Steiner
comment by Charlie Steiner · 2021-07-26T03:06:25.318Z · LW(p) · GW(p)

Have you been able to try the academic copy (rosettafold)?

Replies from: None
comment by [deleted] · 2021-07-27T04:47:55.443Z · LW(p) · GW(p)

Not yet, I used the Google project where they are posting predicted structures of every known human and yeast gene.

https://alphafold.ebi.ac.uk/

The example that made me laugh:

https://alphafold.ebi.ac.uk/entry/Q59W62

comment by Lech Mazur (lechmazur) · 2021-07-15T21:29:04.306Z · LW(p) · GW(p)

Related development: https://www.nature.com/articles/d41586-021-01968-y

"Meanwhile, an academic team has developed its own protein-prediction tool inspired by AlphaFold 2, which is already gaining popularity with scientists. That system, called RoseTTaFold, performs nearly as well as AlphaFold 2, and is described in a paper in Science paper also published on 15 July[2] "

comment by Charlie Steiner · 2021-07-26T03:15:06.222Z · LW(p) · GW(p)

What this makes me think is that quantum computing is mostly doomed. The killer app for quantum computing is predicting molecules and electronic structures. (Perhaps someone would pay for Shor's algorithm, but its coolness far outstrips its economic value). But it's probably a lot cheaper to train a machine-learning based approximation on a bunch of painstakingly assembled data than it is to build enough 50 milliKelvin cyostats. According to this view, the physics labs that will win at superconductor prediction are not the ones working on quantum computers or on theoretical breakthroughs, they're going to be the guys converting every phonon spectrum from the last 50 years into a common data format so they can spend $30K to train a big 3D transformer on it.

comment by ryan_b · 2021-07-15T19:33:29.365Z · LW(p) · GW(p)

Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even where no similar structure is known.

Holy crap. I confess this one catches me by surprise; within my hopes, but beyond my expectations.

Replies from: Charlie Steiner
comment by Charlie Steiner · 2021-07-15T20:03:33.032Z · LW(p) · GW(p)

Pretty sure this is the same (impressive) news as from CASP14 ( https://www.blopig.com/blog/2020/12/casp14-what-google-deepminds-alphafold-2-really-achieved-and-what-it-means-for-protein-folding-biology-and-bioinformatics/ ). But with fancier figures (edit: and more technical details of how they made the predictions) :P

Replies from: gwern
comment by gwern · 2021-07-15T20:31:42.371Z · LW(p) · GW(p)

The previous AF2 discussions were largely a waste of space because the little abstract they had to provide for CASP14 provided hardly anything to go on. But now we have not just a full writeup but source code and models too! Now I consider it worth discussing.

comment by jmh · 2021-07-16T14:10:34.799Z · LW(p) · GW(p)

As I recall the accuracy measurement was something of an average over the whole molecule deviation which could then allow small portions (local) of the predicted shape to differ from the true shape a good bit more.

First, is that a correct recollection? If so, does anyone know of any work on exploring the importance of local deviations from the global averaged type metrics? I would think that would be very important in this type of modeling.