Calculate All Possible Combinations of DNA bases
There are four possible bases nucleotide bases (A, T, G, and C) in the DNA sequence. Sometimes in genomic analysis, you need to calculate all possible combinations with a certain length of nucleotide bases in a DNA sequence.
You can use a formula 4n to calculate the all possible combinations of DNA bases of a given length (where n is the length of the nucleotide bases combination).
For example, there are 16 possible combinations of two nucleotide bases, 64 possible combinations of three nucleotide bases, and so on. The number of possible combinations grows exponentially with the sequence length.
You can use the following Python codes to calculate all possible combinations of nucleotide bases with a certain length in DNA sequence.
Example 1: Calculate all combination of two nucleotide bases
# import package
from itertools import product
# calculate all combinations of two bases
comb = [''.join(b) for b in product("ATGC", repeat=2)]
print(comb)
# output
['AA', 'AT', 'AG', 'AC', 'TA', 'TT', 'TG', 'TC', 'GA', 'GT', 'GG', 'GC',
'CA', 'CT', 'CG', 'CC']
There are 16 possible combinations of two nucleotide bases
Example 1: Calculate all combination of three bases
# import package
from itertools import product
# calculate all combinations of two bases
comb = [''.join(b) for b in product("ATGC", repeat=3)]
print(comb)
# output
['AAA', 'AAT', 'AAG', 'AAC', 'ATA', 'ATT', 'ATG', 'ATC', 'AGA', 'AGT', 'AGG', 'AGC', 'ACA', 'ACT', 'ACG', 'ACC',
'TAA', 'TAT', 'TAG', 'TAC', 'TTA', 'TTT', 'TTG', 'TTC', 'TGA', 'TGT', 'TGG', 'TGC', 'TCA', 'TCT', 'TCG', 'TCC',
'GAA', 'GAT', 'GAG', 'GAC', 'GTA', 'GTT', 'GTG', 'GTC', 'GGA', 'GGT', 'GGG', 'GGC', 'GCA', 'GCT', 'GCG', 'GCC',
'CAA', 'CAT', 'CAG', 'CAC', 'CTA', 'CTT', 'CTG', 'CTC', 'CGA', 'CGT', 'CGG', 'CGC', 'CCA', 'CCT', 'CCG', 'CCC']
There are 64 possible combinations of three nucleotide bases
Similarly, there will be 256 combinations of four bases, and this number increases exponentially with the sequence length
Enhance your skills with courses on genomics and bioinformatics
- Genomic Data Science Specialization
- Biology Meets Programming: Bioinformatics for Beginners
- Python for Genomic Data Science
- Bioinformatics Specialization
- Command Line Tools for Genomic Data Science
- Introduction to Genomic Technologies
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.