A Kernel-Based Change Detection Method to Map Shifts in Phytoplankton Communities Measured by Flow Cytometry

2020
O_LIAutomated, ship-board flow cytometers provide high-resolution maps of phytoplankton composition over large swaths of the worlds oceans. They therefore pave the way for understanding how environmental conditions shape community structure. Identification of community changes along a cruise transect commonly segments the data into distinct regions. However, existing segmentation methods are generally not applicable to flow cytometry data, as this data is recorded as "point cloud" data, with hundreds or thousands of particles measured during each time interval. Moreover, nonparametric segmentation methods that do not rely on prior knowledge of the number of species, are desirable to map community shifts. C_LIO_LIWe present CytoSegmenter, a kernel-based change-point estimation method for segmenting point cloud data that does not rely on parametric assumptions on the data distributions. Our method relies on a Hilbertian embedding of point clouds that allows us to work with point cloud data similarly to vectorial data. The change-point locations can be found using an efficient dynamic programming algorithm. The method can be used to automatically segment long series of underway flow cytometry data. C_LIO_LIThrough an analysis of 12 cruises, we demonstrate that CytoSegmenter allows us to locate abrupt changes in phytoplankton community structure. We show that the changes in community structure generally coincide with changes in the temperature and salinity of the ocean. We also illustrate how the main parameter of CytoSegmenter can be easily calibrated using limited auxiliary annotated data. C_LIO_LICytoSegmenter is publicly available and implemented in the programming language Python. The method is generally applicable for segmenting series of point cloud data from any domain. Moreover, it readily scales to thousands of point clouds, each containing thousands of points. In the context of underway flow cytometry data, it does not require prior clustering of particles to define taxa labels, eliminating a potential source of error. This represents an important advance in automating the analysis of large datasets now emerging in biological oceanography and other fields. It also allows for the approach to potentially be applied during research cruises. C_LI
    • Correction
    • Source
    • Cite
    • Save
    51
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map