Get Non-overlapping Portion Between Two Regions in bedtools

Renesh Bedre    2 minute read

You can use the subtract function from the bedtools to get the non-overlapping portion of genomic intervals between the two regions.

The basic syntax for bedtools subtract is:

bedtools subtract -a region1.bed -b region2.bed > non_overlapping_regions.bed

The following example explains how to use the bedtools subtract to get the non-overlapping intervals between two region files.

Suppose, you have two BED files (region1.bed and region2.bed) containing the genomic intervals as regions and you want to get the non-overlapping portion of intervals from region1.bed that do not overlap with the regions from region2.bed.

region1.bed

cat region1.bed

Chr1    10      20
Chr1    40      70
Chr1    100     150

region2.bed

cat region2.bed

Chr1    15      20
Chr1    60      70
Chr1    120     180

Get the non-overlapping portion of genomic intervals from region1.bed using bedtools subtract.

The bedtools subtract finds the regions from region2.bed that overlap (by at least 1 bp) with the regions from region1.bed. If there is any overlapping regions, then the overlapping interval is removed from the region1.bed, and remaining portion of interval is reported.

bedtools subtract -a region1.bed -b region2.bed > non_overlapping_regions.bed

cat non_overlapping_regions.bed

# output
Chr1    10      15
Chr1    40      60
Chr1    100     120

You can see that the non-overlapping portion of the genomic intervals are reported from the region1.bed (-a) file.

By default, bedtools subtract uses at least 1 bp interval to find the overlapping regions. You can control the amount of overlap (in fractions) from -a file using the -f parameter.

For example, get non-overlapping portion from region1.bed (-a) file when there is at least 50% overlap between regions from region1.bed and region2.bed files.

bedtools subtract -a region1.bed -b region2.bed -f 0.5 > non_overlapping_regions.bed 

cat non_overlapping_regions.bed

# output
Chr1    10      15
Chr1    40      70
Chr1    100     120

You can see that first and third regions overlap by 50% and their non-overlapping portion is reported.

Similarly, You can control the amount of overlap (in fractions) from -b file using the -F parameter.

Enhance your skills with courses on genomics and bioinformatics


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.