Whole Genome and Exome Sequencing Reference Datasets from A Multi-center and Cross-platform Benchmark Study

2021
With the rapid advancement of sequencing technologies in the past decade, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from errors or artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically benchmarking sequencing platforms, library protocols, bioinformatics pipelines and for measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples, a human triple-negative breast cancer cell line and a matched normal cell line derived from B lymphocytes. We generated whole-genome (WGS) and whole-exome sequencing (WES) data using 16 NGS library preparation protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the paired reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies. Background & SummaryThe NGS technology has become a powerful tool for precision medicine. More researchers and clinicians are utilizing NGS to identify clinically actionable mutations in cancer patients and to establish targeted therapies for patients based on the patients genetic makeup or genetic variants of their tumor1, there is a critical need to have a full understanding of the many different variables affecting the NGS analysis output. The rapid growing number of sample processing protocols, library preparation methods, sequencing platforms, and bioinformatics pipelines to detect mutations in cancer genome, presents great technical challenges for the accuracy and reproducibility of utilizing NGS for cancer genome mutation detections. To investigate how these experimental and analytical elements may affect mutation detection accuracy, recently we carried out a comprehensive benchmarking study using both whole-genome (WGS) and whole-exome sequencing (WES) data sets generated from two well-characterized reference samples: a human breast cancer cell line (HCC1395) and a B lymphocytes cell line (HCC1395BL) derived from the same donor (NBT-RS47789). We generated WGS and WES data using various NGS library preparation protocols, seven NGS platforms at six centers (NBT-A46164B).
    • Correction
    • Source
    • Cite
    • Save
    18
    References
    3
    Citations
    NaN
    KQI
    []
    Baidu
    map