hcga: Highly Comparative Graph Analysis for network phenotyping

2020
AO_SCPLOWBSTRACTC_SCPLOWNetworks are widely used as mathematical models of complex systems across many scientific disciplines, not only in biology and medicine but also in the social sciences, physics, computing and engineering. Decades of work have produced a vast corpus of research characterising the topological, combinatorial, statistical and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and some times overlapping) characteristics of a network. In the analysis of real-world graphs, it is crucial to integrate systematically a large number of diverse graph features in order to characterise and classify networks, as well as to aid network-based scientific discovery. In this paper, we introduce O_SCPLOWHCGAC_SCPLOW, a framework for highly comparative analysis of graph data sets that computes several thousands of graph features from any given network. O_SCPLOWHCGAC_SCPLOW also offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterisation of graph data sets. We show that O_SCPLOWHCGAC_SCPLOW outperforms other methodologies on supervised classification tasks on benchmark data sets whilst retaining the interpretability of network features. We also illustrate how O_SCPLOWHCGAC_SCPLOW can be used for network-based discovery through two examples where data is naturally represented as graphs: the clustering of a data set of images of neuronal morphologies, and a regression problem to predict charge transfer in organic semiconductors based on their structure. O_SCPLOWHCGAC_SCPLOW is an open platform that can be expanded to include further graph properties and statistical learning tools to allow researchers to leverage the wide breadth of graph-theoretical research to quantitatively analyse and draw insights from network data.
    • Correction
    • Source
    • Cite
    • Save
    63
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map