Find Max and Min Sequence Length in Fasta

Renesh Bedre    1 minute read

You can use various command-line tools to get the maximum and minimum sequence lengths in a FASTA file.

This article describes how to find the maximum and minimum sequence lengths in a FASTA file in Python, seqkit, and samtools.

Python

You can use the max_min_len() function from bioinfokit (v2.1.4) to find the maximum and minimum sequence lengths in a FASTA file.

# import package
from bioinfokit.analys import Fasta

Fasta.max_min_len("file.fasta")

# output
Max Length Seq: KU562861.1 153
Min Length Seq: MH150936.1 114

In the example file.fasta, the maximum and minimum sequence lengths are 153 bp and 114 bp, respectively.

seqkit

You can use the fx2tab parameter from seqkit to find the maximum and minimum sequence lengths in a FASTA file.

# get max length
seqkit fx2tab --length --name file.fasta | cut -f2 | sort -n | head -1

# output
153

# get min length
seqkit fx2tab --length --name file.fasta | cut -f2 | sort -n | tail -1
# output
114

samtools

You can also use the samtools indexed fasta file for finding the maximum and minimum sequence lengths.

You first need to create an index of the fasta file.

samtools faidx file.fa

The first two columns in the index fasta file (file.fasta.fai) contain the sequence name and their lengths.

You can get the maximum and minimum sequence lengths from file.fasta.fai like this:

# get max length
cut -f2 file.fasta.fai | sort -n | head -1

# output
153

# get min length
cut -f2 file.fasta.fai | sort -n | tail -1

# output
114

Enhance your skills with courses on genomics and bioinformatics


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.