Talk about your research
post by snarles · 2011-05-15T01:32:29.525Z · LW · GW · Legacy · 33 commentsContents
33 comments
There must be quite a few undergrad/graduate/post-doc/???-level researchers on LessWrong. I'm interested in hearing about your work. I'll post about myself in the comments.
33 comments
Comments sorted by top scores.
comment by snarles · 2011-05-15T01:37:19.463Z · LW(p) · GW(p)
My current project involves, in a broad sense, finding ways to fit complex (or mathematically inconvenient) models to data. If you can't fit your models to data, what's the point of the model, or the data?
Specifically, I'm working on statistical inference on protein interaction networks. I'm trying to find ways to fit generative models (such as preferential attachment, duplication-mutation-complementation) to network data in the form of adjacency matrices. Surprisingly little work on the subject has been done so far; a promising approach is Approximate Bayesian Computation.
Replies from: PhilGoetz, jsteinhardt↑ comment by PhilGoetz · 2011-05-15T08:18:49.868Z · LW(p) · GW(p)
What's the goal - what are you trying to infer? I don't know what you mean by "preferential attachment, duplication-mutation-complementation".
Replies from: snarles↑ comment by snarles · 2011-05-15T13:19:01.428Z · LW(p) · GW(p)
What we are trying to do is infer the evolutionary mechanisms by which organisms evolve increasingly complex systems of interacting proteins. A protein interaction network is a map of the organism's proteins and their interactions in the form of a graph: we represent each protein with a node, and draw an edge between two proteins if they can form a complex.
One of the first observations made about protein networks is that the degree distributions of their graphs seem to obey a power law: there are few nodes which have a huge number of interaction partners which act as 'hubs' for the networks. Furthermore, hubs are less likely to have interactions with other hubs.
One evolutionary mechanism for explaining this power law is the preferential attachment model. In this model, the interaction network random gains new proteins at random: the added protein forms a single edge with an existing protein in the network, and it is disproportionally likely to form an edge with a 'hub.' (Technically speaking, the probability of choosing a particular existing node is proportional to the degree of that particular existing node.)
The preferential attachment model is explains the power law distribution, but is too simplistic by itself: it can only explain networks which have tree-like structures.
Another model for network growth is DMC (duplication-mutation-complementation), also called DDA (duplication-divergence-attachment), a model based on the biological phenomenon of gene duplication.
In DMC, there is a duplication step and a mutation step. In the duplication step, we duplicate one of the existing proteins at random. Say protein O was the existing protein and protein P is the duplicate. The duplicate interacts with all of the interaction partners of the original protein. If protein O formed complexes OE, OF, OG, then the duplicate forms complexes PE, PF, PG.
In the mutation step, the original copy and duplicate copy degenerate in a complementary fashion. They can lose the ability to form certain complexes, so long as the copy provides a backup copy of that particular complex. For example, protein O can lose the ability to form complex OE, but as long as its copy P can still form PE, the organism can continue to be viable. Thus an organism might go from having complexes {OE,OF,OG,PE,PF,PG} after the duplication step to having {PE,OF,OG} after the mutation step.
We can propose models for network generation based on preferential attachment, duplication-mutation-complementation, or a mixture of the two, as was proposed in Ratmann (2007).
For any particular model we want to see how well it explains the observed data. Given models M1, M2, M3, we can compare the probability of observing the data given the models, P(D|M1), P(D|M2), P(D|M3). We say that the model with the highest probability is the model which best explains the data.
However, a particular model might have a parameter q. For example, in the duplication-mutation-complementation model, we might want to vary the rate at which complexes are lost in the mutation step. For a model M with a parameter q, the probability of observing the data will depend on the parameter. For a certain parameter value, we write P(D|M,q) for this probability.
Now there are two different approaches to calculating P(D|M) in this situation. The Bayesian approach is to assume a prior distribution (such as a uniform distribution) on the parameter, P(q), and then compute P(D|M) = Integral[P(D|M,q)P(q) dq] over the parameter space.
The classical approach of maximum likelihood is simply to take P(D|M) = max[P(D|M,q)] over the parameter space. Though if we are comparing models, we will want to apply a penalty factor for the complexity of the model, so in fact we will take P(D|M)= c max[P(D|M,q)] where c is a complexity penalty (e.g. according to Aikake Information Criterion).
Replies from: Cyan↑ comment by jsteinhardt · 2011-05-15T04:27:42.970Z · LW(p) · GW(p)
Do you happen to work with Mike Jordan? This sounds strikingly similar to what he does.
Replies from: snarlescomment by Daniel_Burfoot · 2011-05-15T04:48:33.681Z · LW(p) · GW(p)
I am working on large scale lossless data compression, specifically text compression. I chose this problem after a long and roundabout process of philosophical-statistical reflection.
My goal is to demonstrate what I call the "reusability hypothesis", which states that the tools/theories that are useful for compressing a database are also useful for all other questions of interest regarding the database. For example, in relation to text, the hypothesis suggests that knowledge of POS tagging, parsing, and topic recognition (typical problems in SNLP) are also useful for compression.
If this hypothesis is true, it has important implications. Because the compression problem "includes" many other problems, researchers can simply focus their efforts on compression, and they will solve the other problems of interest along the way. This means, among other things, that there is no point in building labeled datasets such as the Berkeley Segmentation Dataset or the Penn Treebank.
Replies from: snarles, Miller, syllogism, jsalvatier, PhilGoetz↑ comment by snarles · 2011-05-15T13:29:52.629Z · LW(p) · GW(p)
How much overlap does the "reusability hypothesis" have with the "minimal description length principle"?
Replies from: Daniel_Burfoot↑ comment by Daniel_Burfoot · 2011-05-15T15:21:29.602Z · LW(p) · GW(p)
Perhaps a better question is: how much overlap does RH have with "empirical science"? My answer is: a lot. By doing empirical science - that is, open-ended, curiousity-motivated research whose goal is to obtain a single optimal description of a particular phenomenon - it just so happens that one also finds theories of direct relevance for practical applications. The same theory that describes the motion of the planets and velocities of balls rolling down inclined planes is also useful for building bridges (reusability). All mainstream empirical scientists recognize this; no one in computer vision or computational linguistics does.
The link to MDL is that to do lossless compression, one must do empirical science. This is because of the NFL theorem that says lossless compression is impossible for general inputs. The only way to achieve compression is to discover empirical regularities in the data and then exploit those regularities. For example, the PNG image format exploits the obvious regularity that the values of adjacent image pixels tend to be correlated.
So MDL implies empirical science which implies RH.
↑ comment by Miller · 2011-05-15T06:11:36.174Z · LW(p) · GW(p)
the hypothesis suggests that knowledge of POS tagging, parsing, and topic recognition (typical problems in SNLP) are also useful for compression.
Did you reverse this? e.g. compression is useful for POS tagging, etc.
Replies from: Daniel_Burfoot↑ comment by Daniel_Burfoot · 2011-05-16T02:30:21.276Z · LW(p) · GW(p)
Did you reverse this? e.g. compression is useful for POS tagging, etc.
No, the original is correct. Let's say you want to compress a text corpus. Then you need a good probability model P(s), of the probability of a sentence, that assigns high probability to the sentences in the corpus. Since most sentences follow grammatical rules, the model P(s) will need to reflect those rules in some way. The better you understand the rules, the better your model will be, and the shorter the resulting codelength. So because parsing (and POS tagging, etc) is useful for compression, the compression principle allows you to evaluate your understanding of parsing (etc).
Replies from: khafra↑ comment by khafra · 2011-05-16T18:13:13.082Z · LW(p) · GW(p)
If your project works, then, it will reduce science to a problem in NP?
Replies from: Daniel_Burfoot↑ comment by Daniel_Burfoot · 2011-05-16T19:21:30.791Z · LW(p) · GW(p)
That's what other compression advocates want to do. In fact, it's pretty obvious that this is true - if you found an algorithm capable of computing the Kolmogorov complexity of any input string, you would pretty much put all theoretical scientists out of work. Unfortunately or not, this problem is uncomputable. Some people (Hutter et al) think, however, that it is possible to find good general purpose algorithms for computing approximations to the K-complexity.
I disagree - I think such algorithms are far out of reach. To me, the compression idea is interesting because it provides a new way to evaluate one's understanding of various phenomena. I want to study text compression because I am interested in text, not because I think it will lead to some grand all-purpose algorithm. This is in the spirit of The Virtue of Narrowness.
↑ comment by syllogism · 2011-05-17T09:10:32.857Z · LW(p) · GW(p)
About the NLP stuff:
You seem to be speculating that a solution to parsing is going to pop out of understanding compression. Your argument is that parsing is essentially a problem of modelling P(s), and compression is just that. But I'm skeptical that the compression angle is going to give you special insight. People have been keenly interested in language models since approximately forever, since they're critical for translation and other applications. They're pretty much the bedrock of the field.
I'd note though that text compression would be a useful evaluation technique for language technologies. There are lots of competing encoding schemes for grammars, and they're difficult to compare them, which makes tools difficult to compare. "Which one compresses text best" seems like a very good solution. I don't know whether it's been proposed before. It sounds familiar, but that might just be because good ideas have that sort of resonance sometimes.
↑ comment by jsalvatier · 2011-05-15T19:58:47.707Z · LW(p) · GW(p)
Would you say that compression 'includes' many other problems or that compression and those problems (I assume you mean things like statistics etc.) are isomorphic (or partially anyway)? Why do you recommend focusing on compression and applying compression techniques to other fields instead of vice versa or both at once? Is there a reason you see compression as more fundamental?
Replies from: Daniel_Burfoot↑ comment by Daniel_Burfoot · 2011-05-16T02:39:45.179Z · LW(p) · GW(p)
I would say that if you understand phenomenon X, you should be able to build a specialized compressor for phenomenon X that will achieve a short codelength. The study of compression algorithms for datasets produced by phenomenon X is effectively identical to the study of phenomenon X. Furthermore compression provides a strong methodological advantage: there can be no hand-waving; it is impossible to "cheat" or self-deceive.
↑ comment by PhilGoetz · 2011-05-15T08:16:29.937Z · LW(p) · GW(p)
Compression is cognition. Gerry Wolf has also been saying this for many years. I haven't studied his work closely enough or recently enough to comment on it.
Replies from: Daniel_Burfoot, Kevin↑ comment by Daniel_Burfoot · 2011-05-15T15:26:37.336Z · LW(p) · GW(p)
Lots of people promote the compression idea. But as far as I know, with the exception of the Hutter Prize, no one has actually attempted to do large scale compression, in any kind of systematic way. People in SNLP talk about language modeling, but they are not systematic (no shared benchmarks, no competitions, etc). No one in computer vision attempts to do large scale image compression.
Replies from: khafra↑ comment by khafra · 2011-05-16T18:39:58.770Z · LW(p) · GW(p)
I just spent an enjoyable lunch break reading about the Hutter Prize. Do you know if the paper with a broken link from a few places in its faq which seems to be disparaging Occam's Razor is still online anywhere? A mailing list entry is the most complete summary of it I could find.
comment by Plasmon · 2011-05-15T06:25:20.607Z · LW(p) · GW(p)
I work on the ITER project, specifically I develop algorithms for multiscale simulation of fusion plasmas.
I consider it not unlikely that nuclear fusion will be a useful source of energy sometime in this century. And if some game-changing event happens before that, well so be it (included here are FAI, one of a large variety of small-scale fusion devices turning out to be practical, the greens finally getting their act together and starting to produce truly large amounts of "green" energy, ...).
On a meta level: why are comments in this thread being upvoted? Should I conclude that the LW community thinks data fitting is 5/3 more interesting than precambrian atmospheric dynamics?
Replies from: PhilGoetz, snarlescomment by InquilineKea · 2011-05-15T03:59:51.425Z · LW(p) · GW(p)
My upcoming project (over the summer) is about studying the atmospheric dynamics of the Precambrian Earth using 3D atmospheric science models (probably using the CAM3 models often used for modelling the Earth's atmosphere). I am then planning to move onto a project to model the 3D dynamics of Earth's atmosphere around various types of stars, which should hopefully help us predict what the atmosphere of an Earth-like exoplanet should look like (spectrally) around various star types
So basically, my primary interests are astronomy and atmospheric science (I'm also really into computer science and statistics, which should help with both). I could say that astrobiology merges them all, but I'm unsure if anything astrobiologically significant will ever come out.
I'm also into super-long-term sustainability, which makes this very relevant (most of the shorter-term sustainability things are already clogged up with people)
Replies from: snarles↑ comment by snarles · 2011-05-15T13:26:13.648Z · LW(p) · GW(p)
What kind of models are used for modelling the atmosphere? Do you use lots of finite-element methods?
Replies from: InquilineKea↑ comment by InquilineKea · 2011-05-15T19:49:50.958Z · LW(p) · GW(p)
Well, the kind of model I'm using is http://www.cesm.ucar.edu/models/atm-cam/docs/description/. I think it uses finite-element methods - I haven't gotten into the raw details of it yet, but I'll read them once school is over. There's also a radiative transfer part that can get quite complex (especially once you get into the photochemistry of molecules other than ozone with higher fluxes of higher-frequency light)
There are other models that use atmosphere+ocean coupling, but for now we'll stick to a slab-ocean model, since full coupling would probably require a supercomputer.
Most of the papers so far have only been on energy-balance models, which are really just 1D models to predict the climate of the planet as a whole.
Here's another example of similar work that has been done (recently) on tidally-locked planets: http://astrobites.com/2011/05/10/pack-your-suitcase-super-earth-gliese-581d-is-in-the-%E2%80%98habitable-zone%E2%80%99/
comment by TimFreeman · 2011-05-15T18:44:06.280Z · LW(p) · GW(p)
I have a Friendly AI proposal at http://www.fungible.com/respect. It got some good feedback on the SL4 list some time back, so I have a list of bugs to fix in the proposal that have not yet been fixed. They do seem fixable, though.
Assuming they are fixable, we'd have a specification for a Friendly AI that would be mathematically well-defined, but not easily implementable.
It takes an ordinary approach to cause-and-effect, so if it were confronted by Newcomb's problem it would either self-modify or lose, depending on whether the entity in control in that story manages to read its mind before it was able to self-modify.
Furthermore it assumes that there is a real world and that the humans it is responsible for care about the portion of the real world that they interact with, so humans concerned about simulations or about events happening to copies of themselves that are unreasonably far away won't be well served by it.
(There are lots of more practical issues on the bug list, too, and frankly I'm not very concerned about the issues stated above.)
I have a day job that can't plausibly be called research.
comment by jsalvatier · 2011-05-15T07:53:11.325Z · LW(p) · GW(p)
For everyone, please link to your research if possible!
comment by syllogism · 2011-05-17T00:11:18.594Z · LW(p) · GW(p)
- Level: junior post-doc
- Field: Computational linguistics / Statistical Natural Language Processing
- Subfield: Statistical parsing
- Research program: Combinatory Categorial Grammar
Essentially I work on a compiler for English. Natural languages have very ambiguous grammars though, so probability models are critical.
Sample paper, that's unfortunately 20-levels deep and not of much general interest: http://www.aclweb.org/anthology/P/P10/P10-1022.pdf
comment by hwc · 2011-05-29T13:32:05.479Z · LW(p) · GW(p)
I just started a summer position researching scientific visualization. The challenge is to be able to deal with ensembles of large datasets that result from simulations run with varying parameters.
comment by Jonathan_Graehl · 2011-05-18T07:14:38.860Z · LW(p) · GW(p)
My research is mostly in service of getting syntax (and, lately, semantics) into statistical machine translation.