blastn
: Command-line Utility for Nucleotide Sequence Search
The blastn
is a command-line utility from the NCBI BLAST toolkit
that is used for performing nucleotide-nucleotide sequence similarity searches using the BLAST algorithm.
blastn
compares a query nucleotide sequence against a nucleotide BLAST database to identify
homologous sequences. If you want to compare protein sequence against a protein BLAST database, please
see blastp
tool.
The general syntax of blastn
looks like this:
# basic command
blastn -query query_fasta -db blast_nucl_db -outfmt output_format -out output_file
# command with advanced regularly used options
blastn -query query_fasta -db blast_nucl_db -evalue 1e-05 -perc_identity 60 \
-max_target_seqs 5 -num_threads 10 -outfmt output_format -out output_file
Where,
Parameter | Description |
---|---|
-query |
Input nucleotide sequences in FASTA format to search against a nucleotide BLAST database |
-db |
Formatted nucleotide BLAST database. See makeblastdb for creating a formatted BLAST database. |
-evalue |
Expectation value (E) value threshold you want to use for the search (default 10). Matches with lower evalue represent significant matches |
-perc_identity |
Percent identity |
-max_target_seqs |
Maximum number of aligned sequences to be reported for each query in the output (default 500). A value of >=5 is recommended |
-num_threads |
Number of threads (CPU cores) for the search (default 1). More is better for a faster search. |
-outfmt |
Numerical value representing a predefined output format or a custom string specifying the fields you want to include in the BLAST output (default 0, pairwise) |
-out |
Name of the output file where results will be saved |
In addition to the above frequently used parameters, you can see more parameters and their usage using the blastn -help
command
Note:
blastn
requires the formatted BLAST database. You can create it using themakeblastdb
command or you can download the preformatted BLAST database from NCBI.
The following examples explain how to use blastn
on the command line for nucleotide-nucleotide sequence similarity
searches.
Let’s say you have an input query nucleotide sequence (input.fasta
) and a formatted nucleotide database (target_nucl_db
).
Run basic blastn
command
blastn -query input.fasta -db target_nucl_db -outfmt 6 -out blastn_output.txt
Above blastn
compare the nucleotide sequences in input.fasta
against the formatted target_nucl_db
, and save the results
in tabular format (-outfmt 6
) in the blastn_output.txt
file.
The output should look like this:
head -n5 blastp_ouput.txt
seq1 tar4 100.000 95 0 0 1 95 1 95 4.24e-49 176
seq1 tar3 100.000 95 0 0 1 95 1 95 4.24e-49 176
seq2 tar2 100.000 101 0 0 1 101 102 202 2.49e-52 187
seq3 tar2 100.000 101 0 0 1 101 1 101 2.06e-52 187
seq4 tar4 100.000 101 0 0 1 101 1 101 2.54e-52 187
seq4 tar3 100.000 101 0 0 1 101 1 101 2.54e-52 187
The columns in the output file (with -outfmt 6
) represent query id, target id, % identical matches, alignment length, mismatches, gap openings,
query start, query end, target start, target end, evalue, and bitscore.
Run blastn
command with customized options
blastn -query input.fasta -db target_nucl_db -evalue 1e-05 -perc_identity 60 -max_target_seqs 5 -num_threads 10 \
-outfmt "6 qseqid qlen sseqid slen qstart qend sstart send nident pident length mismatch gaps qcovs evalue bitscore" \
-out blastn_output.txt
Above blastn
compare the nucleotide sequences in input.fasta
against the target_nucl_db
with given parameter cut-offs, and save the results with
in a tabular format with customized fields in the blastn_output.txt
file.
The output should look like this:
head -n5 blastn_output.txt
seq1 95 tar4 101 1 95 1 95 95 100.000 95 0 0 100 4.24e-49176
seq1 95 tar3 224 1 95 1 95 95 100.000 95 0 0 100 4.24e-49176
seq2 120 tar2 202 1 101 102 202 101 100.000 101 0 0 84 2.49e-52187
seq3 101 tar2 202 1 101 1 101 101 100.000 101 0 0 100 2.06e-52187
seq4 122 tar4 101 1 101 1 101 101 100.000 101 0 0 83 2.54e-52187
seq4 122 tar3 224 1 101 1 101 101 100.000 101 0 0 83 2.54e-52187 105
The columns in the output file represent the customized columns mentioned in -outfmt
parameter.
Enhance your skills with courses on genomics and bioinformatics
- Genomic Data Science Specialization
- Biology Meets Programming: Bioinformatics for Beginners
- Python for Genomic Data Science
- Bioinformatics Specialization
- Command Line Tools for Genomic Data Science
- Introduction to Genomic Technologies
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.