Split the sequence into smaller subsequences

Renesh Bedre    less than 1 minute read

This article explains how to split the nucleotide sequence into smaller subsequences with the desired size.

The nucleotide sequence can be split into smaller sequences by overlap or without overlap

Split the sequences with overlap

In overlap mode, the sequence will split with a sliding window of 1 bp (e.g., ATGC will split into ATG and TGC with the default size of 3)

To run this code, install bioinfokit v2.0.6 or later

from bioinfokit import analys
analys.Fasta.split_seq(seq='ATGCAT', seq_size=3)
# output
['ATG', 'TGC', 'GCA', 'CAT']

# Note: if you want to save subsequence in fasta file, add parameter outfmt='fasta'

Check more usage here

Split the sequences without overlap

In no overlap mode, the sequence will split without overlap (e.g., ATGCAT will split into ATG and CAT with the default size of 3)

from bioinfokit import analys
analys.Fasta.split_seq(seq='ATGCAT', seq_size=3, seq_overlap=False)
# output
['ATG', 'CAT']

# Note: if you want to save subsequence in fasta file, add parameter outfmt='fasta'

Check more usage here

If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com

If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com


This work is licensed under a Creative Commons Attribution 4.0 International License