Information

The meaning of “Non-primary hits” and “tags”


Can I ask you to help understand the meaning of "Non-primary hits" and "tags" in the following paragraph related to RSeQC:

The result table reports total number of reads (excluding nonprimary hits) and tags (separate splice fragments of a read). Total assigned tags indicate how many tags can be assigned unambiguously to the ten different categories listed below.

Total Reads 49743155
Total Tags 63012643
Total Assigned Tags 57529077
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Group Total_bases Tag_count Tags/Kb
CDS_Exons 36821030 34763281 944.11
5'UTR_Exons 34901580 2856644 81.85
3'UTR_Exons 54908278 9772738 177.98
Introns 1450606807 8468986 5.84
TSS_up_1kb 31234456 94103 3.01
TSS_up_5kb 139129272 161914 1.16
TSS_up_10kb 249300845 217980 0.87
TES_down_1kb 32868738 789703 24.03
TES_down_5kb 142432117 1368378 9.61
TES_down_10kb 251276738 1449448 5.77
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =


"tag" is a weird term in this context. It used to mean "read", but RSeQC likes using it in an odd way. Suppose you have a read that aligns to a single contiguous (though possibly with InDels) stretch of DNA. This would then count as one tag. Suppose instead that you have a spliced read, that is one where part aligns to one exon and part to another exon. Each aligned part of this read then counts separately as a tag. So this 1 read becomes 2 tags. By extension, if you have a read that is spliced such that is starts in exon 1, jumps to exon 2, and then finally jumps to exon 3, then it'd count as 3 tags. This is only useful if you're interested in splicing, since the higher this value is the better you're able to discern differences there.

For "non-primary hits", suppose you have a read that can align equally well to multiple places in the genome. If there are 5 equally good alignments, then 4 of these will be labeled as "secondary" and one of them will not, which is then the "primary alignment". "non-primary hits" is then an odd way of saying "secondary alignments". The number of primary alignments is equal to the number of reads that aligned.