Create a Local BLAST Database From FASTA File
The local BLAST database is useful for performing fast and efficient local sequence searches using NCBI BLAST tool.
You can search against specific sequences using the local BLAST database (instead of the whole NCBI database). A local BLAST database is also useful for reproducible sequence searches.
The NCBI BLAST executables contains the
makeblastdb
utility for creating the local BLAST database for nucleotide and protein sequences.
The general syntax of makeblastdb
looks like this:
# for nucleotide sequences
makeblastdb -in input.fasta -dbtype nucl -parse_seqids -out test
# for protein sequences
makeblastdb -in input.fasta -dbtype prot -parse_seqids -out test
Where,
Parameter | Description |
---|---|
-in |
Input FASTA file (nucleotide or protein) to create a BLAST database |
-dbtype |
Molecule type (“nucl” for nucleotide and “prot” for protein sequences) |
-parse_seqids |
Enable sequence id parsing. This is useful for extracting the sequences by their IDs using blastdbcmd . This is optional but recommended to use |
-out |
Name of the database (default will be input fasta file name). This is optional. |
The following examples explains how to use makeblastdb
to create a local BLAST database from FASTA file.
Create a BLAST database for nucleotide sequences
For example, the sample_nucl.fasta file contains the following DNA sequences,
# example FASTA file (sample_nucl.fasta)
>seq1
TTCAGTTCCTCCATCTCTCTAAGCTGTTTTTCAGAAATGGTGTCTGGGTTGGAGACATCAAGA
>seq2
CTTCACGATCACGAATCACGATTACATAAACTCCACAACTTCACGGTTCCTTCCAATCAGTTCCAGTGT
>seq3
TTTTTGAGAGCTGGAACTATCTGGAGCATCAATTTTCCCAGGATTAGGGAATTGACATCTCT
Now, use a makeblastdb
to create a local BLAST database of DNA sequences
makeblastdb -in sample_nucl.fasta -dbtype nucl -parse_seqids -out sample
Building a new DB, current time: 07/30/2023 22:16:52
New DB name: /home/renesh/lin_proj/atha_eg/temp/sample
New DB title: sample_nucl.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 3 sequences in 0.0218239 seconds.
You should see sample.nhr
, sample.nin
, sample.nog
, sample.nsd
, sample.nsi
, and sample.nsq
database files,
once the makeblastdb
succesfully completed.
Note: If your input FASTA file is large (> 4GB; greater than
-max_file_sz
parameter), the BLAST database will be split into many parts. For example, you should see database files likesample.00.nhr
,sample.01.nhr
, and so on.
You can use this formatted BLAST database to perform local sequence search (e.g. blastn
) using input FASTA query sequence.
Create a BLAST database for protein sequences
For example, the sample_prot.fasta file contains the following protein sequences,
# example FASTA file (sample_prot.fasta)
>seq1
MERLNSKLYVENCYIMKENEKLRKKAELLNQENQQLLVQLKQKLSKANKNPNGSNNDNNVSSSSSASGKS
>seq2
KQKLSKANKNPNGSNNDNNVSSSSSASGKSNCYIMKENEKLRKKAELLNQENQQLL
>seq3
KLRKKAELLNQENQQLLVQLKQKLSKLVQLKQKLSKANKNPNGSNNDNNVSSSSNSKLYVENCYIMKEN
Now, use a makeblastdb
to create a BLAST database of protein sequences
makeblastdb -in sample_prot.fasta -dbtype prot -parse_seqids -out sample
Building a new DB, current time: 07/30/2023 22:25:49
New DB name: /home/renesh/lin_proj/atha_eg/temp/sample
New DB title: sample_prot.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 3 sequences in 0.00608587 seconds.
You should see sample.phr
, sample.pin
, sample.pog
, sample.psd
, sample.psi
, and sample.psq
database files,
once the makeblastdb
succesfully completed.
You can use this formatted BLAST database to perform local sequence search (e.g. blastp
) using input FASTA query sequence.
Enhance your skills with courses on genomics and bioinformatics
- Genomic Data Science Specialization
- Biology Meets Programming: Bioinformatics for Beginners
- Python for Genomic Data Science
- Bioinformatics Specialization
- Command Line Tools for Genomic Data Science
- Introduction to Genomic Technologies
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.