Whole genome sequencing vs SNP genotyping

post by Vlad Sitalo (harcisis) · 2015-06-11T22:09:06.044Z · LW · GW · Legacy · 9 comments

Hi, I was considering possibility of undergoing some kind of genetic testing for a while now. But I hesitated because I have a quite limited financial resources available right now. Eventually I've figured that in the future I'd do it anyway, so why don't just do it right away. Especially taking into account that knowing some data available via such tests in an earlier stage of life and acting on it could be quite beneficial in a long run. 

So after that realization I've thought about going with 23andme and performing SNP genotyping. But in the process of browsing related information I've encountered article about rapid decreasing in price for Whole genome sequencing. After that I've found this listing: https://www.scienceexchange.com/services/whole-genome-seq?page=2 and the lowest price tag there is $795 (but it's not clear whether that particular option is applicable to human genomes).

And as a result the whole WGS thing seems quite appealing, but I have some concerns here. I still have quite limited finance and WGS seems to have at least 10x prize in relation to SNP. And the other concern is data applicability (more on this one further).

Information about what and how you can do with WGS data is somewhat scarce on the Internet. So I have some questions here:

I know that for SNP tests there are tools like Promethease (http://www.snpedia.com/index.php/Promethease) that allow you to analyze your data. But I was not able to find anything about tools for analyzing your WGS data.

So are there any good tools available for you to analyze your WGS data? And: Is it computationally feasible to get results in an adequate time, using a general pc for this purposes?

And finally with all above said: should I go for whole genome sequencing/SNP/or my whole line of reasoning is invalid and I should wait for now and get WGS in a few years for a smaller price?

(My main concern here is positive influence on health in a long run.)

Thanks.

9 comments

Comments sorted by top scores.

comment by Elo · 2015-06-12T00:08:01.477Z · LW(p) · GW(p)

in terms of simply the WGS now vs WGS a little later (I estimate the cost to be <$200 in <10years). The most important questions I would be asking is:-

Am I going to do anything different now with the WGS knowledge that I won't be able to do later (in <10 years)?

Considering I am a live (and still alive) human, will a WGS discovery find a disease that could drastically change my lifestyle? (some - i.e. possible heart problems, possible cancer risks etc)

Procreation status of my family: Do I have kids, will knowing this before I have kids; change the way I have kids (discovery of carrier of a disease). Also applies to biological relatives and their kids. Yes get testing before having kids.

What powerful and useful things have others done with their WGS knowledge? Can what they have learnt be usefully applied to my situation once I have my WGS?

Meta: health - importantly - If this expensive medical test came back with an indication that I should be doing X, would it be cheaper to just do X, and not get the test? (where for some medical test - X might be; live a healthier lifestyle, but might also be install a pacemaker before symptoms, splint the bone without knowing if its a hairline break or a bruise, take extra vitamins, amputate the leg...)

As much as I love data and WGS would give you a whole bunch of data to play with, what would you DO with that data? would it be anything that you can't do in 10 years for 1/10th the price and a similar opportunity cost? As long as you have these questions answered to your satisfaction - do it!

comment by zslastman · 2015-06-17T15:48:45.459Z · LW(p) · GW(p)

WGS is going to get cheaper and cheaper as time goes on, presumably in the future we'll have developed a process for analysing the results properly. In the intervening time, there isn't much to be gained from it. SNP genotyping gives you most of the info about common variants, because the things it doesn't catch (deletions, insertions, etc.) will generally have some SNP in linkage to them. The rare variants are what you miss, and right now we don't really know what to do with them.

In general I wouldn't overestimate how much genotyping will tell you. Your family history is likely to be more informative.

comment by Douglas_Knight · 2015-06-12T15:13:45.759Z · LW(p) · GW(p)

Especially taking into account that knowing some data available via such tests in an earlier stage of life and acting on it could be quite beneficial in a long run.

Whole genome data won't give you health information beyond SNPs because the studies used to generate the health claims use SNPs, because they're cheap.

But there is very little useful data available from SNPs, anyhow. What kind of data do you believe is available and what actions might you take?

If the baseline is a 30% chance of dying of heart disease and it told you that you had a 70% chance, what would you do differently? (but it won't.) Probably you should already be doing it because 30% is a big number!

If the baseline is 0.1% chance of MS and it told you that you have a 0.2% chance of MS, what would you do differently?

The only thing it might tell you that is useful is that you have a high chance of developing what is otherwise a rare disease. It will probably tell you that you have a mildly broken BRCA and have a 22% chance of breast cancer, rather than a 20% chance. That is worthless, even if it is true. But if you're Ashkenazi, there is a 1% chance that it will tell you that you have a fully broken BRCA. That means an 80% chance of breast cancer, which is really not that different from a 20% chance. But it also means aggressive breast cancer at a young age. This is really a different disease. It is a substantial chance of death at a young age. Many women get prophylactic mastectomies. The Dutch have the same 1% base rate as Ashkenazi, but 23andMe does not test for them (at least as of v3). Thus whole genome sequencing can detect the 1% of Dutch with fully broken BRCA and the 1/10,000 of other populations. But those base rates are silly. If you have this gene in your family, you should know already from the high prevalence of breast cancer, especially young fatal breast cancer.

Also, there's a SNP that predicts how fast you metabolize drugs. That sounds useful, but no one takes advantage of it. More generally, it is quite rare that people experiment with doses of drugs, which they really should.

If you do get a whole sequence, you can test every gene to see if it is fully broken ("nonsense mutation" as opposed to the mild "missense"). If you have such a gene, it probably is a big deal, but it will probably be hard to interpret.

Replies from: Lumifer
comment by Lumifer · 2015-06-12T15:34:45.507Z · LW(p) · GW(p)

If the baseline is a 30% chance of dying of heart disease and it told you that you had a 70% chance, what would you do differently? (but it won't.) Probably you should already be doing it because 30% is a big number!

That line of argument is flawed because actions have costs.

To give a simple example (and discussing whether it's precisely correct is besides the point), you can take baby aspirin to reduce the risk of heart attacks. Aspirin is a blood thinner, it makes clots (which cause heart attacks) less likely. However decreasing coagulation is not an unmitigated blessing. If you get internal bleeding -- e.g. a blood vessel ruptured in your brain -- that aspirin you've been taking could make things significantly worse. It's a trade-off.

Given this, you want to know on which side (basically, heart attacks vs. strokes) is your personal risk the highest. It is actionable knowledge.

an 80% chance of breast cancer, which is really not that different from a 20% chance

Looks very different to me -- you are quite cavalier with a fourfold difference in odds...

Replies from: Douglas_Knight
comment by Douglas_Knight · 2015-06-12T16:46:51.710Z · LW(p) · GW(p)

4 is a small number. It is pretty rare that a cost-benefit calculation cares about that factor of 4, that multiplying the benefits by 4 will change the decision from reject to accept.

Replies from: Lumifer
comment by Lumifer · 2015-06-12T16:49:53.514Z · LW(p) · GW(p)

It seems we have a different perception of smallness and tend to encounter different cost-benefit calculations.

comment by Strangeattractor · 2015-06-21T05:22:18.874Z · LW(p) · GW(p)

One thing to consider is that whole genome sequencing, as it exists now, has errors of up to 30% of the data. There was a paper in 2014 comparing the results of the same DNA in the two main competitors' machines, and the results didn't match a big percentage of the time. I don't recall the name of the paper or the author, or I'd link to it.

Another thing to consider is the possibility of putting one's DNA into a research database under a Creative Commons license, to be used by the public at large (mostly researchers). Anonymity would be attempted, but could not be guaranteed. The Personal Genome Project http://www.personalgenomes.org is looking for volunteers in many countries. Depending on the country, and the funding available, it may be a way to get the data without having to fund it all yourself.

As for what you can do with it...I think at the moment mostly what you can do is search for scientific articles on areas of interest to you. I think the data will become more useful over time.

comment by Marlon · 2015-06-12T15:40:51.316Z · LW(p) · GW(p)

The main idea of WGS is having >all< the SNPs whereas you only get the most common SNPs with the SNP tests.

I'm not really sure how you would use the data from WGS (let's say the genome is assembled too - or maybe that would cost more ?). You would probably use BLAT on your local machine to search for genes with known SNPs. I don't think you could do anything more (finding novel SNPs is out of reach here).

I would guess the main idea would be to be able to check for new SNPs as more and more are found in the literature. However, the literature is not that easy to skim through except for the most common SNPs that are already included in the SNP tests.

Going back on the literature: for most multi-factorial diseases, you will see data coming from GWAS and linkage disequilibrium studies that will be really hard to interpret. A SNP popping up like that does not necessarily mean that you've got the trait associated with it.

My comment was probably not really well oriented, but I should still conclude. In my opinion, do a WGS only if you've got enough knowledge of bioinformatics (and I mean an engineer's level). SNP tests are cheap and will provide you with almost everything you could get from a WGS.

Replies from: Douglas_Knight
comment by Douglas_Knight · 2015-06-12T17:07:42.258Z · LW(p) · GW(p)

I agree with your general point, but here is a technical comment: 23andMe is the million most common SNPs, but that is not the same as the million most common variants, because not all variation is in the form of a SNP. SNP stands for "single nucleotide polymorphism" -- it means that one letter is changed while the context is unchanged. They are easy to detect because of that context, and that ease of detection is why they are used.

Another kind of variation is an insertion or a deletion. They are harder to detect, which is why 23andMe only detects three of them, ones in the BRCA gene that are common among Ashkenazi. It does not attempt to detect even the ones that are equally common among the Dutch. They are easy to detect with whole genome sequencing and they are valuable to detect because they are fairly easy to interpret: the whole protein is ruined. What the protein does and what you can do about it are harder problems, but it's not like finding a new SNP, where it probably means nothing.

A third kind of variation is copy number variation, where there is a repetitive section of the DNA and number of repeats varies from person to person. But whole genome sequencing today is bad at such regions, at least if the number of repeats is large. A lot of people think that they are important, but the fact that they are hard to measure makes that hard to assess at this time.