Cost and privacy concerns have motivated the development of distributed learning algorithms. We introduce three distributed learning strategies in the presentation: Federated Learning, Diffusion Learning, and Incremental Learning. We apply them to a neural network model and an NLP task (word2vec).
Neural Network based Language Models (slides)
Markov chain-based n-gram models have dominated language modelling until early 2000s when Bengio came up with the first decent neural language model. Long-Short Term Memory (LSTM) was later discovered to imitate human language so well that it is still widely used today. Finally, we talked about recent modeling, optimization and regularization techniques that deliver state-of-the-art performance on LSTM.
In large-scale optimization, it's often useful to eliminate the server node, but then we often have to sacrifice more inter-node communications. The balance between computation and communication is also a central concern in such algorithms. I talked about Professor Guanghui Lan's recent techniques: Focus-on-Primal and Gradient Sliding.
Matrix Estimation with Rank Constraints (slides)
Recommendation system can be formulated as a matrix completion problem. We estimate missing terms in a matrix based on a sample of entries under certain constraints on the whole matrix. We talked about probabilistic bounds on the estimation and presented a recent collection of non-convex methods to deal with large-scale estimation.