Samtools: Extract Mapped and Unmapped Paired-end Reads
Samtools can be used for extracting mapped and unmapped paired-end reads from SAM/BAM files.
Unlike single-end read filtering, you need to consider whether the paired-end reads are properly paired and both reads of the pairs are mapped while extracting mapped and unmapped paired-end reads.
The paired-end reads are properly paired (concordant alignments) when both of the reads are mapped to the reference genome in the correct orientation as per library preparation protocol (e.g.first read on the forward strand and second read on the reverse strand). In addition, the properly paired reads will have the expected insert size (distance between the mapped positions of the read pair).
You can use the samtools view
command with -F
or -f
parameter and associated flag values for
extracting mapped and unmapped paired-end reads from SAM/BAM files.
If you have not installed samtools,read this article on installing samtools.
The following examples demonstrate how to extract mapped and unmapped paired-end reads from the BAM file using samtools.
Extract paired reads mapped in the proper pair
You can use the following commands to extract the reads mapped in the proper pair.
samtools view -b -f 2 input.bam > mapped.bam
Where, -b
parameter specifies the output should be in BAM format, -f 2
parameter specifies to extract paired-end reads
mapped in proper pair.
The above command will create a new BAM file mapped.bam
which will contain paired-end reads
mapped in proper pair.
If you want to create an output file in SAM format, you can use the following command.
samtools view -f 2 input.bam > mapped.sam
Extract paired reads where one read is mapped and the other is unmapped
You can use the following commands to extract the paired-end reads where one read is mapped and the other read is unmapped.
samtools view -b -F 4 -f 8 input.bam > mapped.bam
Where, -b
parameter specifies the output should be in BAM format, -F 4
parameter specifies to extract paired-end reads that
are mapped, and -f 8
parameter specifies to extract paired-end reads where one of the reads in the pair is unmapped.
The above command will create a new file mapped.bam
which will contain mapped paired-end reads where one of the reads in the pair
is unmapped i.e. extract paired reads where one of the read is mapped and other is unmapped.
If you want to create an output file in SAM format, you can use the following command.
samtools view -F 4 -f 8 input.bam > mapped.sam
Extract paired-end reads where both reads are not mapped
You can use the following commands to extract the paired-end reads where both reads are not mapped to the reference genome.
samtools view -b -f 12 input.bam > unmapped.bam
Where, -b
parameter specifies the output should be in BAM format, -f 12
parameter specifies to extract the paired-end
reads where both reads of the pairs are not mapped.
The above command will create a new file unmapped.bam
which will contain paired-end
reads where both reads of the pairs are not mapped.
Please read my article on extracting single-end reads from SAM/BAM file.
Enhance your skills with courses on genomics and bioinformatics
- Genomic Data Science Specialization
- Biology Meets Programming: Bioinformatics for Beginners
- Python for Genomic Data Science
- Bioinformatics Specialization
- Command Line Tools for Genomic Data Science
- Introduction to Genomic Technologies
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.