Convert FASTQ to FASTA Format
The FASTQ and FASTA file formats are widely used in bioinformatics data analysis.
In a FASTQ file, the nucleotide sequences and quality scores are stored, while in a FASTA file, only the nucleotide sequence information is stored.
Using one of these tools, you can convert a FASTQ file into a FASTA file:
seqtk
You can use the seqtk seq
to convert FASTQ to FASTA as follows:
# with compressed FASTQ
seqtk seq -a sample.fastq.gz > sample.fasta
# with uncompressed FASTQ
seqtk seq -a sample.fastq > sample.fasta
reformat (from BBTools)
You can use the reformat.sh
from BBTools
to convert FASTQ to FASTA as follows:
reformat.sh in=sample.fastq out=sample.fasta
If you have paired-end reads, you can obtain FASTA files for both reads (read1 and read2) simultaneously
reformat.sh in1=read1.fastq in2=read2.fastq out1=read1.fasta out2=read2.fasta
seqret
You can use the seqret
(from EMBOSS) tool to
convert FASTQ to FASTA as follows:
seqret -sequence sample.fastq -outseq sample.fasta
fastq_to_fasta
You can use the fastq_to_fasta
(from FASTX-Toolkit) tool to
convert FASTQ to FASTA as follows:
fastq_to_fasta -Q 33 -i sample.fastq -o sample.fasta
The -Q 33 parameter indicates Illumina sequence format (Phred +33).
bioinfokit
bioinfokit
is a Python package that can be used for FASTQ to FASTA
conversion as below,
# import package
from bioinfokit import analys
# convert FASTQ to FASTA
analys.format.fqtofa(file="sample.fastq")
The output FASTA file will be saved as output.fasta
in the same directory
awk
awk
can be used for FASTQ to FASTA conversion as below,
awk "NR%4 == 1 || NR%4 == 2" sample.fastq | tr "@" ">" > sample.fasta
Detailed example for FASTQ to FASTA using seqtk
The following example demonstrates how to convert FASTQ to FASTA with a sample FASTQ file. You can download the sample FASTQ file using this link.
View the FASTQ file,
# view first few sequences
head sample.fastq
@SRR22309490.1 1 length=101
CTGTTTTGTCTATTTTTGTTTGGTGCATTAGCTCCAATTGTGAACGTTAATTATGGAGGAATTAGTGGTGCTTTTTATGGGAACTATAGATCTAATTATAT
+SRR22309490.1 1 length=101
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@SRR22309490.2 2 length=101
ACCGTATATGTTTTCTATGTTCTCCACCGCAACATACTCTCCTTGTGAGAGTTTAAAGATATTCTTCTTCCTGTCAATTATCTTCATGCTTCCATCTGGTT
+SRR22309490.2 2 length=101
<AAF<J7<<JJJJJJJJFJFF<FJFFJJJJJJJJJJJFJ-FJJFJJJJJJJJJJJFJJF<FJJJJJJJJFJJJJJJJJJJFFJJFFAJJFJFFJJ<FF-FA
@SRR22309490.3 3 length=101
CTCCACTACTATCTCTTCTTCTTTGGAATATCTCCACGGAAAATCATCTTCACAAAAGCGAGATATTCCATTATCGCACCAAAAGTGTCTATGTGAACCCA
Now, convert FASTQ to FASTA using seqtk
# convert FASTQ to FASTA
seqtk seq -a sample.fastq > sample.fasta
# view first few sequences from FASTA
head sample.fasta
>SRR22309490.1 1 length=101
CTGTTTTGTCTATTTTTGTTTGGTGCATTAGCTCCAATTGTGAACGTTAATTATGGAGGAATTAGTGGTGCTTTTTATGGGAACTATAGATCTAATTATAT
>SRR22309490.2 2 length=101
ACCGTATATGTTTTCTATGTTCTCCACCGCAACATACTCTCCTTGTGAGAGTTTAAAGATATTCTTCTTCCTGTCAATTATCTTCATGCTTCCATCTGGTT
>SRR22309490.3 3 length=101
CTCCACTACTATCTCTTCTTCTTTGGAATATCTCCACGGAAAATCATCTTCACAAAAGCGAGATATTCCATTATCGCACCAAAAGTGTCTATGTGAACCCA
>SRR22309490.4 4 length=101
CCATGACCTTGGATACAACTTGCCTAGTGGGTCATGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCGTATCTCGTATGCCGTCTTCTGCT
>SRR22309490.5 5 length=101
CTCGCAGTTGACTCATACTTAGCTCTATCGGTTTTGTACATGTGAGCAATCTCTGGAACCAATGGATCATCTGGGTTTGGGTCCGTTAACAATGAACATAT
Similarly, you can convert FASTQ files to FASTA files using other tools described above.
Enhance your skills with courses on genomics and bioinformatics
- Genomic Data Science Specialization
- Biology Meets Programming: Bioinformatics for Beginners
- Python for Genomic Data Science
- Bioinformatics Specialization
- Command Line Tools for Genomic Data Science
- Introduction to Genomic Technologies
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.