Get Non-overlapping Portion Between Two Regions in bedtools
You can use the subtract
function from the bedtools to get the non-overlapping portion of genomic intervals between the two regions.
The basic syntax for bedtools subtract
is:
bedtools subtract -a region1.bed -b region2.bed > non_overlapping_regions.bed
The following example explains how to use the bedtools subtract
to get the non-overlapping intervals between two
region files.
Suppose, you have two BED files (region1.bed and region2.bed) containing the genomic intervals as regions and you want to get the non-overlapping portion of intervals from region1.bed that do not overlap with the regions from region2.bed.
region1.bed
cat region1.bed
Chr1 10 20
Chr1 40 70
Chr1 100 150
region2.bed
cat region2.bed
Chr1 15 20
Chr1 60 70
Chr1 120 180
Get the non-overlapping portion of genomic intervals from region1.bed using bedtools subtract
.
The bedtools subtract
finds the regions from region2.bed that overlap (by at least 1 bp) with the regions from region1.bed. If there is any
overlapping regions, then the overlapping interval is removed from the region1.bed, and remaining portion of interval is reported.
bedtools subtract -a region1.bed -b region2.bed > non_overlapping_regions.bed
cat non_overlapping_regions.bed
# output
Chr1 10 15
Chr1 40 60
Chr1 100 120
You can see that the non-overlapping portion of the genomic intervals are reported from the region1.bed (-a
) file.
By default, bedtools subtract
uses at least 1 bp interval to find the overlapping regions. You can control the amount of overlap (in fractions) from -a
file using the
-f
parameter.
For example, get non-overlapping portion from region1.bed (-a
) file when there is at least 50% overlap between regions from region1.bed and region2.bed files.
bedtools subtract -a region1.bed -b region2.bed -f 0.5 > non_overlapping_regions.bed
cat non_overlapping_regions.bed
# output
Chr1 10 15
Chr1 40 70
Chr1 100 120
You can see that first and third regions overlap by 50% and their non-overlapping portion is reported.
Similarly, You can control the amount of overlap (in fractions) from -b
file using the -F
parameter.
Enhance your skills with courses on genomics and bioinformatics
- Genomic Data Science Specialization
- Biology Meets Programming: Bioinformatics for Beginners
- Python for Genomic Data Science
- Bioinformatics Specialization
- Command Line Tools for Genomic Data Science
- Introduction to Genomic Technologies
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.