CSER: Communication-efficient SGD with Error Reset

Cong Xie,Shuai Zheng,Oluwasanmi Koyejo,Indranil Gupta,Mu Li,Haibin Lin

CSER: Communication-efficient SGD with Error Reset

2020

Cong Xie
Shuai Zheng
Oluwasanmi Koyejo
Indranil Gupta
Mu Li
Haibin Lin

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called "error reset" that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms: i) cause no loss of accuracy, and ii) accelerate the training by nearly $10\times$ for CIFAR-100, and by $4.5\times$ for ImageNet.

Keywords:

Periodic graph (geometry)
Mathematical optimization
Algorithm
partial synchronization
Stochastic gradient descent
Residual
Mathematics
Scalability
Convergence (routing)

Correction
Source
Cite
Save

References

Citations