What is a Sequencing Read?

post by jefftk (jkaufman) · 2023-10-25T02:10:11.443Z · LW · GW · 2 comments

Probably the most common form of genetic sequencing these days is "paired-end" sequencing. It's very impressive: the sequencing machine can process the same nucleic acid fragment from both ends! This means that each observation looks like:

+--------------+---------+--------------+
| forward read |   gap   | reverse read |
+--------------+---------+--------------+

Because accuracy ("quality") tends to drop off as you sequence further into a fragment, sequencing from both ends gives you much more accurate data than trying to sequence the whole thing from one end. And because we build up larger sequences ("contigs") by piecing together overlapping ones ("assembly"), two sequences of bases separated by a gap are actually usually more helpful than the same number of bases without a gap.

It's common to refer to paired-end sequencing with designations like "2x150", where the "2x" tells us it's paired-end, and the "150" tells us it reads for 150 bases from each end, for a total of 300 bases per fragment.

But this introduces a terminology question: what is a read? When we only had "single-end" sequencing it was clear: each sequenced fragment, each contiguous sequence of bases, was a read. With paired-end sequencing, however, these are no longer the same thing! There are two things a "read" could mean:

For example, say we have:

>SRR14530724.2 2/1
CATTTTCGACGGCGTCGATGTACAAAGGTTATACCATAGTAAGTCCGAAGC
TACAGGCTTATGACACCGCAGAGTCAATGTATTCCGGTGACAATGTACTGA
TGTACAGTGGGACTGACACTGTCTCTTATACACATCTCCGAGCCCACGA
>SRR14530724.2 2/2
TGTCAGTCCCACTGTACATCAGTACATACACACCGGAATACATTGACTCTG
CGGTGTCATAAGCCTGTAGCTTCGGACTTACTATGGTATAACCTTTGTACA
TCGACGCCGTCGAAAATGCTGTCTCTTATACACATCTGACGCTGCCGAC

This is a forward read (SRR14530724.2 2/1) and a reverse read (SRR14530724.2 2/2) that together comprise a single observation of a fragment from the sample and would generally by analyzed together. Does this count as one read or two?

Turns out people do both, and it leads to a lot of misunderstandings!

Some examples:

This is a mess! And, to make it worse, as far as I can tell there's no standard term other than "read" either "what both the forward and reverse read are examples of" or "what the forward and reverse read are when considered together".

I've been using "read" to mean "read pair", but given the ambiguity I think I should switch to another term. The NCBI SRA uses "spots", but no one else seems to use this terminology. You can just say "read pair", which is pretty good, but a bit long. Possible "pairs" or "mates" would be good? Thoughts?

Comment via: facebook, mastodon

2 comments

Comments sorted by top scores.

comment by Metacelsus · 2023-10-25T20:29:38.271Z · LW(p) · GW(p)

"Paired-end reads" is the most common way to phrase it (at least in my experience). One paired-end read comprises both ends and is equivalent to a read pair. I agree that "read pair" may be less confusing.

Replies from: jkaufman
comment by jefftk (jkaufman) · 2023-10-25T22:51:12.725Z · LW(p) · GW(p)

I wish "paired end read" always meant that, but I've encountered people saying "N paired end reads" to mean "N reads generated by paired end sequencing", where "read" is either a forward or reverse read.

For example, copying from a recent email: "100 M paired end reads (50 M in each direction)"