with Kun Yuan, Bicheng Ying, and Ali H. Sayed, IEEE Transactions on Signal Processing 67 (2), 351-366. doi: 10.1109/TSP.2018.2872003.
In decentralized algorithms, computing nodes only communicate with their neighbors (no central servers involved), which eliminates server failure problems, relieves communication bottlenecks, and protects data privacy. Applications include supercomputers with millions of cores, and networked self-driving cars. However, decentralized algorithms often (1) fail to converge, (2) require more computation time, and/or (3) cost excessive communication compared to single-machine algorithms. Our algorithm, Diffusion AVRG, solves (1) and outperforms state-of-the-art algorithms on both (2) and (3).
Fig. 1 Our algorithm, under the "best" setting shown in the figure, reduces the time cost of a standard machine learning task from 21.3 to 4.4 units of time.
Neural Network-based Language Models (slides)
Markov chain-based n-gram models have dominated language modeling until the early 2000s when Bengio came up with the first decent neural language model. Long-Short Term Memory (LSTM) was later discovered to imitate human language so well that it is still widely used today. Finally, we talked about recent modeling, optimization, and regularization techniques that deliver state-of-the-art performance on LSTM.
In large-scale optimization, it's often useful to eliminate the server node, but then we often have to sacrifice more inter-node communications. The balance between computation and communication is also a central concern in such algorithms. I talked about Professor Guanghui Lan's recent techniques: Focus-on-Primal and Gradient Sliding.
Matrix Estimation with Rank Constraints (slides)
Recommendation systems can be formulated as a matrix completion problem. We estimate missing terms in a matrix based on a sample of entries under certain constraints on the whole matrix. We talked about probabilistic bounds on the estimation and presented a recent collection of non-convex methods to deal with large-scale estimation.
Natural Language Processing with Distributed Learning (video)
Cost and privacy concerns have motivated the development of distributed learning algorithms. We introduce three distributed learning strategies in the presentation: Federated Learning, Diffusion Learning, and Incremental Learning. We apply them to a neural network model and an NLP task (word2vec).
Blockchain and Cryptocurrency (HBS RA reading group, slides)
- What is blockchain technology and what are cryptocurrencies like bitcoins?
- What are the interesting questions on this topic, and where can we find data to answer them?
- What is the current progress in the literature?