STARRPeaker: Uniform processing and accurate identification of whole human STARR-seq active regions
2019
High-throughput reporter assays, such as self-transcribing active regulatory region sequencing (
STARR-seq), allow for unbiased and quantitative assessment of enhancers at the genome-wide level. In order to cover the size of the
human genome, recent advancements of
STARR-seqtechnology have employed more complex
genomic libraryand increased sequencing depths. These advances necessitate a reliable processing pipeline and
peak-callingalgorithm. Most studies of
STARR-seqhave relied on
chromatin immunoprecipitationsequencing (ChIP-seq) processing pipeline to identify peak regions. However, here we highlight key differences in the processing of
STARR-seqversus ChIP-seq data.
STARR-sequses transcribed RNA to measure enhancer activity, making determining the basal transcription rate important. Further,
STARR-seqcoverage is non-uniform,
overdispersed, and often confounded by sequencing biases such as
GC contentand mappability. We observed a correlation between RNA thermodynamic stability and
STARR-seqRNA readout, suggesting that
STARR-seqmight be sensitive to RNA secondary structure and stability. Considering these findings, we developed a statistical framework for uniformly processing
STARR-seqdata: STARRPeaker. We applied our method to two whole
human genome
STARR-seqexperiments; HepG2 and K562. Our method identifies highly reproducible and epigenetically active enhancers across replicates. Moreover, STARRPeaker outperforms other peak callers in terms of identifying known enhancers. Thus, our framework optimized for processing
STARR-seqdata accurately characterizes cell-type-specific enhancers, while addressing potential confounders.
Keywords:
-
Correction
-
Source
-
Cite
-
Save
51
References
5
Citations
NaN
KQI