Scanflow: an end-to-end agent-based autonomic ML workflow manager for clusters

2021 
Machine Learning (ML) is more than just training models, the whole life-cycle must be considered. Once deployed, a ML model needs to be constantly managed, supervised and debugged to guarantee its availability, validity and robustness in dynamic contexts. This demonstration presents an agent-based ML workflow manager so-called Scanflow1, which enables autonomic management and supervision of the end-to-end life-cycle of ML workflows on distributed clusters. The case study on a MNIST project2 shows that different teams can collaborate using Scanflow within a ML project at different phases, and the effectiveness of agents to maintain the model accuracy and throughput of the model serving while running in production.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []
    Baidu
    map