Samtools: How to Filter Mapped and Unmapped Reads
Samtools is a suite of utilities commonly used in analyzing the aligned sequence data in the SAM (Sequence Alignment/Map) and BAM (Binary Alignment/Map) formats in bioinformatics and genomics analysis.
samtools view
command with -F
or -f
parameter and a flag value is typically used in the filtering mapped and unmapped sequence
reads from SAM/BAM files.
The flag value is a numerical value that encodes various properties of each read alignment. For example, the flag value of 4 (0x4) indicates that the sequence read does not have a valid alignment to the reference genome (unmapped sequence reads).
If you have not installed samtools,read this article on installing samtools.
The following examples demonstrate how to filter mapped and unmapped sequence reads from the BAM file using samtools.
Filter unmapped sequence reads
You can use the following commands to filter the unmapped sequence reads from the BAM file using Samtools.
samtools view -b -f 4 input.bam > unmapped.bam
Where, -b
parameter specify the output should be in BAM format, -f 4
parameter specifies to filter the unmapped
sequence reads (retain only unmapped sequence reads in unmapped.bam
).
The above command will create a new BAM file unmapped.bam
which will contain only unmapped reads from the input BAM file.
If you want to create an output file in SAM format, you can use the following command.
samtools view -f 4 input.bam > unmapped.sam
The above command will create a new SAM file unmapped.sam
which will contain only unmapped reads from the input BAM file.
These commands will work for the single-end reads. While filtering mapped and unmapped sequence reads for paired-end data, it is also important to consider whether the paired-end reads are properly paired. Please read my article on extracting paired-end reads from SAM/BAM file.
Filter mapped sequence reads
You can use the following commands to filter the mapped sequence reads from the BAM file using Samtools.
samtools view -b -F 4 input.bam > mapped.bam
Where, -b
parameter specifies the output should be in BAM format, -F 4
parameter specifies to filter out the unmapped
sequence reads (retain only mapped sequence reads in mapped.bam
).
The above command will create a new file mapped.bam
which will contain only mapped reads from the input BAM file.
If you want to create an output file in SAM format, you can use the following command.
samtools view -F 4 input.bam > mapped.sam
The above command will create a new SAM file mapped.sam
which will contain only mapped reads from the input BAM file.
These commands will work for the single-end reads. While filtering mapped and unmapped sequence reads for paired-end data, it is also important to consider whether the paired-end reads are properly paired. Please read my article on extracting paired-end reads from SAM/BAM file.
Enhance your skills with courses on genomics and bioinformatics
- Genomic Data Science Specialization
- Biology Meets Programming: Bioinformatics for Beginners
- Python for Genomic Data Science
- Bioinformatics Specialization
- Command Line Tools for Genomic Data Science
- Introduction to Genomic Technologies
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.