High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble Generators

2017
We developed a cheminformaticspipeline for the fully automated selection and extraction of high-quality protein-bound ligand conformations from X-ray structural data. The pipeline evaluates the validity and accuracy of the 3D structures of small molecules according to multiple criteria, including their fit to the electron density and their physicochemical and structural properties. Using this approach, we compiled two high-quality datasets from the Protein Data Bank(PDB): a comprehensive dataset and a diversified subset of 4626 and 2912 structures, respectively. The datasets were applied to benchmarking seven freely available conformer ensemblegenerators: Balloon (two different algorithms), the RDKit standard conformer ensemblegenerator, the Experimental-Torsion basic Knowledge Distance Geometry(ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK. Substantial differences in the performance of the individual algorithms were observed, with RDKit and ETKDG generally achieving a favorable balance of accur...
    • Correction
    • Source
    • Cite
    • Save
    50
    References
    33
    Citations
    NaN
    KQI
    []
    Baidu
    map