STARRPeaker: uniform processing and accurate identification of STARR-seq active regions.

Donghoon Lee,Manman Shi,Jennifer R. Moran,Martha Wall,Jing Zhang,Jason Liu,Dominic Fitzgerald,Yasuhiro Kyono,Lijia Ma,Kevin P. White,Mark Gerstein

STARRPeaker: uniform processing and accurate identification of STARR-seq active regions.

2020

Donghoon Lee
Manman Shi
Jennifer R. Moran
Martha Wall
Jing Zhang
Jason Liu
Dominic Fitzgerald
Yasuhiro Kyono
Lijia Ma
Kevin P. White
Mark Gerstein

STARR-seq technology has employed progressively more complex genomic libraries and increased sequencing depths. An issue with the increased complexity and depth is that the coverage in STARR-seq experiments is non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content. Furthermore, STARR-seq readout is confounded by RNA secondary structure and thermodynamic stability. To address these potential confounders, we developed a negative binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. Moreover, to aid our effort, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to comprehensively and unbiasedly call enhancers in them.

Keywords:

Computational biology
Enhancer
Human genetics
Biology
Negative binomial distribution
Nucleic acid secondary structure
human cell
GC-content
STARR-seq
Genomic library

Correction
Source
Cite
Save

References

Citations