Read mapping: Answers to exercise questions
Mapping short reads to a reference using HISAT2
How many RNA-seq read pairs were provided as input to HISAT2?
768 pairs (768*2 = 1536 reads)
From HISAT2 summary output file, or input FASTQ file.
How many of those read pairs were mapped by HISAT2?
693, sum of:
682 concordantly exactly 1 time
0 concordantly >1 times (i.e. multimapping)
11 discordantly
(from HISAT2 summary output)
How many reads were uniquely mapped, i.e. mapped to one genomic location?
1440
Can compute this by running:
grep -cw 'NH:i:1' hisat2.sam
Or if you have made a BAM file:
samtools view hisat2.bam | grep -cw 'NH:i:1'
Or by looking at the summary output file, summing up:
2 * 682 concordantly and uniquely mapped pairs
2 * 11 discordantly and uniquely mapped pairs
54 uniquely mapped reads lacking a mapping for the paired read
In general, do the alignments seem to be good? I.e. do they cover the entire reads and contain few mismatches?
Yes.
Look at mapping quality, CIGAR string and NM tag (edit distance) in the SAM file:
cut -f5-6,12- hisat2.sam | more
Or for the BAM file:
samtools view hisat2.bam | cut -f5-6,12- | more
Mapping short reads to a reference using STAR
How many RNA-seq read pairs were provided as input to STAR?
768 pairs (768*2 = 1536 reads)
(From Log.final.out, or input FASTQ file)
How many of those read pairs were mapped by STAR?
738, sum of
733 uniquely mapped
5 multimapped
(From Log.final.out)
How many reads were uniquely mapped, i.e. mapped to one genomic location?
1466
This can be obtained from Log.final.out (2 * 733 uniqely mapped reads).
Or by running:
grep -cw 'NH:i:1' Aligned.out.sam
Or on a BAM file:
samtools view star.bam | grep -cw 'NH:i:1'
In general, do the alignments seem to be good? I.e. do they cover the entire reads and contain few mismatches?
Yes.
See Log.final.out, in particular "Average mapped length" and "Mismatch rate per base".
Can also use samtools (see the answer to the corresponding question for HISAT above).
Converting SAM files to BAM files
Does the output from samtools flagstat confirm any of your answers to the questions in the HISAT2 and STAR sections above?
Kind of. Note that several of the numbers refer to the number of mappings (not reads) and there can be several mappings per read.
Load the the BAM files with HISAT2 and STAR results into IGV. Go to the RAB11FIP5 locus. Have HISAT2 and STAR mapped the reads in a similar way?
Yes.
Detailed examination of the read alignments in IGV should indicate if the RNA-seq data is strand-specific. Is it?
No.
Right-click on the track name, choose "Color alignments by" -> "First of pair strand".
If the data is strand-specific, one color should dominate for every gene (depending on the strand of the gene).