With the fast development of machine learning (especially deep learning) and cloud computing, it has become a trend to train machine learning models in a distributed manner on a cluster of machines. In recent years, there have been many exciting progresses along this direction, with quite a few papers published, and several open-source projects populated. For example, distributed machine learning tools such as Petuum , TensorFlow , and DMTK  have been developed; parallel learning algorithms such as LightLDA , parallel logistic regression , and XGBoost  have been proposed; and convergence theory for both synchronous and asynchronous parallelization  have been established. However, there are also many open issues in this field, for example,
- How to select an appropriate infrastructure (e.g., parameter server vs. data flow) and parallelization mechanism (e.g., synchronous vs. asynchronous), given the application and system configuration?
- Why many papers reported linear speed-ups, but when the accuracy on real-world workloads is required, the speed-up is far smaller than that ?
- Why parallelization mechanisms with similar convergence rates could perform so differently in practice?
- How to conduct proper comparison/evaluation for distributed machine learning (e.g., benchmark, criteria, system configurations, and baselines)?
Without answers to these important questions, people can hardly be confident in wide adoption of distributed machine learning in real applications. This workshop is designed to answer these questions. With this workshop, we hope to provide the community with deep insights and to substantially push the frontier of distributed machine learning.
The workshop will consist of both invited talks and contributed talks, and a panel discussion. The contributed talks mainly call for blue-sky ideas, but also welcome on-going research works. You are highly encouraged to submit your ideas or works to our workshop, so as to share with the wide audience of the workshop. The papers are encouraged to focus on (but not limited to) the following topics
- Distributed machine learning systems and infrastructure
- Parallelization mechanisms for distributed machine learning
- Parallel machine learning algorithms
- Theory for distributed machine learning
- Toolkits for distributed machine learning
- Applications of distributed machine learning
For those who want to submit papers to our workshop, please go to https://easychair.org/conferences/?conf=dml2017
Tie-Yan Liu (email@example.com)
James Kwok (firstname.lastname@example.org)
Chih-Jen Lin (email@example.com)