Split the sequence into smaller subsequences
This article explains how to split the nucleotide sequence into smaller subsequences with the desired size.
The nucleotide sequence can be split into smaller sequences by overlap or without overlap
Split the sequences with overlap
In overlap mode, the sequence will split with a sliding window of 1 bp (e.g., ATGC will split into ATG and TGC with the default size of 3)
To run this code, install bioinfokit v2.0.6 or later
from bioinfokit import analys
analys.Fasta.split_seq(seq='ATGCAT', seq_size=3)
# output
['ATG', 'TGC', 'GCA', 'CAT']
# Note: if you want to save subsequence in fasta file, add parameter outfmt='fasta'
Check more usage here
Split the sequences without overlap
In no overlap mode, the sequence will split without overlap (e.g., ATGCAT will split into ATG and CAT with the default size of 3)
from bioinfokit import analys
analys.Fasta.split_seq(seq='ATGCAT', seq_size=3, seq_overlap=False)
# output
['ATG', 'CAT']
# Note: if you want to save subsequence in fasta file, add parameter outfmt='fasta'
Check more usage here
If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com
If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License