STARRPeaker: Uniform processing and accurate identification of whole human STARR-seq active regions

2019
High-throughput reporter assays, such as self-transcribing active regulatory region sequencing ( STARR-seq), allow for unbiased and quantitative assessment of enhancers at the genome-wide level. In order to cover the size of the human genome, recent advancements of STARR-seqtechnology have employed more complex genomic libraryand increased sequencing depths. These advances necessitate a reliable processing pipeline and peak-callingalgorithm. Most studies of STARR-seqhave relied on chromatin immunoprecipitationsequencing (ChIP-seq) processing pipeline to identify peak regions. However, here we highlight key differences in the processing of STARR-seqversus ChIP-seq data. STARR-sequses transcribed RNA to measure enhancer activity, making determining the basal transcription rate important. Further, STARR-seqcoverage is non-uniform, overdispersed, and often confounded by sequencing biases such as GC contentand mappability. We observed a correlation between RNA thermodynamic stability and STARR-seqRNA readout, suggesting that STARR-seqmight be sensitive to RNA secondary structure and stability. Considering these findings, we developed a statistical framework for uniformly processing STARR-seqdata: STARRPeaker. We applied our method to two whole human genome STARR-seqexperiments; HepG2 and K562. Our method identifies highly reproducible and epigenetically active enhancers across replicates. Moreover, STARRPeaker outperforms other peak callers in terms of identifying known enhancers. Thus, our framework optimized for processing STARR-seqdata accurately characterizes cell-type-specific enhancers, while addressing potential confounders.
    • Correction
    • Source
    • Cite
    • Save
    51
    References
    5
    Citations
    NaN
    KQI
    []
    Baidu
    map