Concatenate and split VCF files
What is VCF file?
- VCF stands for variant call format
- It is a text file (file extension as .vcf) storing meta-information, marker and genotype data of genetic variations
How to merge multiple VCF files?
- We will use
bioinfokit v0.9.4
or later - Check bioinfokit documentation for installation and documentation
Sometimes, it is necessary to concatenate different VCF files for analysis as the genotype information stored in multiple files (For example, you have different VCF files for every chromosome).
# I am using interactive python interpreter (Python 3.7.4)
# go to a directory where all vcf files are stored. make sure all files are uncompressed.
# make sure you will have uniform VCF files. For example, multiple VCF files
# generated from same source datasets
>>> from bioinfokit.analys import marker
# concatenate VCF files. You can provide multiple VCF files separated by comma.
>>> marker.concatvcf("file_1.vcf,file_2.vcf,file_3.vcf,file_4.vcf")
# merged VCF files will be stored in same directory (concat_vcf.vcf)
Split VCF file by chromosome
Split single VCF file containing variants for all chromosomes into individual VCF file containing variants for each chromosomes
>>> from bioinfokit.analys import marker
>>> marker.splitvcf(file="file.vcf")
# multiple VCF files for each chromosomes will be saved in same directory
This work is licensed under a Creative Commons Attribution 4.0 International License