Enhancer-like regions based on DNase and H3K27ac signals

DNase hypersensitivity and the histone modification H3K27ac are well known indicators of enhancers. We developed an unsupervised method that combined DNase and H3K27ac signals in the same cell type to predict enhancer-like regions across ENCODE cell and tissue types. We compared methods that anchored predictions of enhancer-like regions on DNase or H3K27ac peaks, called using the ENCODE uniforming processing pipelines for DNase-seq and histone mark ChIP-seq data. We also compared various schemes for ranking these predictions, such as ranking by p-value or signal. We used regions from the VISTA database as the gold standard; VISTA regions have been experimentally tested for enhancer activity in transgenic mouse assays. To account for the high ratio of negative regions to positive regions in the VISTA database, we used area under Precision-Recall curves (AU-PR) for evaluating method performance.

Across four mouse tissues (midbrain, hindbrain, neural tube, and limb) at the embryonic day 11.5 (E11.5), anchoring predictions on DNase peaks outperforms anchoring predictions on H3K27ac peaks. We also found that ranking predictions by a combination of H3K27ac signal and DNase signal performed the best (Figure 1).

The best performing method anchors predictions on DNase peaks and then ranks these predictions by the average of the ranks of DNase and H3K27ac signals (Figure 2). We use a smaller window (500 bp) for ranking DNase signals than for ranking H3K27ac signals (2 kb), in accordance with the different peak widths of these two types of signals. After ranking the regions anchored on DNase peaks, we predict the boundaries using the overlapping H3K27ac peaks. We denote the top 20,000 non-redundant TSS-distal (more than 2 kb away) regions as enhancer-like regions (i.e. we count regions with multiple highly ranked DNase peaks only once). We retain all TSS-proximal regions ranked above the 20,000-th enhancer-like region, as they may be promoters with enhancer-like activities.

We have applied this method to 47 human cell types and 14 mouse cell types with both DNase and H3K27ac data generated by the ENCODE and Roadmap Epigenomic consortia. For cell and tissues types with only H3K27ac or DNase data, we simply rank the peaks using the available data and make predictions of enhancer-like regions. For cell types with only DNase, we estimate the boundaries of the enhancer-like regions using a set of master H3K27ac peaks which are a merged set of H3K27ac peaks from 120 cell and tissue types. These regions are shown in a lighter color as demonstrated in Figure 3. Users can query these enhancer-like regions by genomic locations, nearby genes, or SNPs, and visualize them in the UCSC and WashU genome browsers. We have also made these region available for download.

Figure 1. Example PR curves used to determine best performing method for predicting enhancer-like regions. Ranking peaks by an average rank of DNase and H3K27ac signals consistently performed the best.

Figure 2. Method for predicting enhancer-like regions.

Figure 3.