Create a gene counts matrix from featureCounts
- featureCounts software program summarizes the read counts for genomic features (e.g., exons) and meta-features (e.g., gene) from genome mapped RNA-seq, or genomic DNA-seq reads (SAM/BAM files).
- featureCounts uses genomics annotations in GTF or SAF format for counting genomic features and meta-features.
When you want to analyze the data for differential gene expression analysis, it would be convenient to have counts for all samples in a single file (gene count matrix). You can get this gene count matrix file when you run featureCounts on all mapped files at once.
# meta-feature (gene) level count
featureCounts -t 'exon' -g 'gene_id' -a annotation.gtf -T 10 -o counts.txt library1.bam library2.bam library3.bam
# use -f option for feature (exon) level count
But, when you run a featureCounts for large samples individually, then the counts for each sample will be in a separate text file.
To get the merged gene count matrix from all individual counts files, we will use bioinfokit v2.0.5
# run this Python code (in a Python interpreter) from a folder where all files are present
from bioinfokit.analys import HtsAna
# make sure all individual count files are present in same folder
# by default, it assumes each count file has .txt extension
HtsAna.merge_featureCount()
See detailed usgae of HtsAna.merge_featureCount
here
Once it runs successfully, you can see the output file gene_matrix_count.csv
in the same folder, which has counts
merged for all samples.
# gene_matrix_count.csv
Geneid,sample1.bam,sample2.bam,sample3.bam
PGSC0003DMG400015133,0,7,2
PGSC0003DMG400015132,72,95,155
PGSC0003DMG400022764,42,78,77
PGSC0003DMG400022799,2,3,5
Enhance your skills with courses on genomics and bioinformatics
- Genomic Data Science Specialization
- Biology Meets Programming: Bioinformatics for Beginners
- Bioinformatics Specialization
- Command Line Tools for Genomic Data Science
References
- featureCounts: an efficient general purpose program for assigning sequence reads to genomic features
- featureCounts: a ultrafast and accurate read summarization program
If you have any questions, comments or recommendations, please email me at reneshbe@gmail.com
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.