The 6th International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics

May 29, 2017
Orlando, Florida, USA

In Conjunction with 31st IEEE International Parallel & Distributed Processing Symposium
May 29-June 2, 2017
Buena Vista Palace Hotel
Orlando, Florida USA
IPDPS 2017 logo

Change in keynote speaker

The afternoon keynote talk will be given by Peter Kogge on "Big Graphs: Sparsity, Irregularity, and Streaming: the Need for Innovation in Architecture" Due to a personal emergency, Dr. Wei Tan will not give the previously scheduled talk.

Best Paper Award

The Best Paper Award goes to Azalia Mirhoseini, Bita Rouhani, Ebrahim Songhori, and Farinaz Koushanfar for their paper ExtDict: Extensible Dictionaries for Data- and Platform-Aware Large-Scale Learning. Congratulations!

Advance Program

Time Title Authors/Speaker
8:15-8:30am Opening remarks
8:30-9:30am Keynote 1: Why Tables and Graphs for Knowledge Discovery Systems John Feo, Northwest Institute for Advanced Computing
9:30-10:00am Break
10:00-10:30am ExtDict: Extensible Dictionaries for Data- and Platform-Aware Large-Scale Learning (ParLearning-01) Azalia Mirhoseini, Bita Rouhani, Ebrahim Songhori and Farinaz Koushanfar
10:30-11:00am Coded TeraSort (ParLearning-02) Songze Li, Sucha Supittayapornpong, Mohammad Ali Maddah-Ali and Salman Avestimehr
11:00-11:30am Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing (ParLearning-03) Nitin Gawande, Joshua Landwehr, Jeff Daily, Nathan Tallent, Abhinav Vishnu and Darren Kerbyson
11:30-12:00pm Efficient and Portable ALS Matrix Factorization for Recommender Systems (ParLearning-04) Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Xuhao Chen and Canqun Yang
12:00-1:30pm Lunch
1:30-2:30pm Keynote 2: Matrix Factorization on GPUs: A Tale of Two Algorithms
Keynote 2: Big Graphs: Sparsity, Irregularity, and Streaming: the Need for Innovation in Architecture
Wei Tan
Peter Kogge, University of Notre Dame, USA
2:30-3:00pm Large-Scale Stochastic Learning using GPUs (ParLearning-05) Thomas Parnell, Celestine Duenner, Kubilay Atasu, Manolis Sifalakis and Haris Pozidis
3:00-3:30pm Break
3:30-3:50pm Distributed and in-situ machine learning for smart-homes and buildings: application to alarm sounds detection (ParLearning-06) Amaury Durand, Yanik Ngoko and Christophe Cérin
3:50-4:10pm The New Large-Scale RNNLM System Based On Distributed Neuron (ParLearning-07) Dejiao Niu, Rui Xue, Tao Cai, Hai Li and Effah Kingsley
4:10-4:30pm A Cache Friendly Parallel Encoder-Decoder Model without Padding on Mulit-core Architecture (ParLearning-08) Yuchen Qiao, Kenjiro Taura, Kazuma Hashimoto, Yoshimasa Tsuruoka and Akkiko Eriguchi
4:30-4:45pm Discussion of ParLearning 2018

Keynote talk 1

Dr. John Feo, Northwest Institute for Advanced Computing

Why Tables and Graphs for Knowledge Discovery Systems

Abstract: The availability of data is changing the way science, business, and law enforcement operate. Economic competitiveness and national security depend increasingly on the insightful analysis of large data sets. The breadth of analytic processes is forcing knowledge discovery platforms to supplement traditional table-based methods with graph methods that provide better support for sparse data and dynamic relationships among typed entities. While storing data in only tables makes it difficult to discover complex patterns of activities in time and space, tables are the most efficient data structures for storing dense node and edge attributes, and executing simple select and join operations. Consequently, knowledge discovery systems must support both in a natural way without preference. In this talk, I will describe the hybrid data model, SHAD, that we developing to support both graphs and tables. I will present several high-performance, scalable, analytic platforms developed by PNNL for graph analytics, machine learning, and knowledge discovery. I will include an overview of HAGGLE, our proposed graph analytic platform for new architectures.

Bio: DR. JOHN FEO is the Director of the Northwest Institute for Advanced Computing, a joint institute established by Pacific Northwest National Laboratory and University of Washington. Previously, he managed a large DOD research project in graph algorithms, search, parallel computing, and multithreaded architectures. Dr. Feo received his Ph.D. in Computer Science from The University of Texas at Austin. He began his career at Lawrence Livermore National Laboratory where he managed the Computer Science Group and was the principal investigator of the Sisal Language Project. Dr. Feo then joined Tera Computer Company (now Cray Inc) where he was a principal engineer and product manager for the MTA-1 and MTA-2, the first two generations of the Cray’s multithreaded architecture. He has taken short sabbaticals to work at Sun Microsystem, Microsoft, and Context Relevant. Dr. Feo’s has held academic positions at UC Davis and Washington State University.

Keynote talk 2

Change in keynote speaker
The afternoon keynote talk will be given by Peter Kogge on "Big Graphs: Sparsity, Irregularity, and Streaming: the Need for Innovation in Architecture."

Bio: PETER M. KOGGE received his Ph.D. in EE from Stanford in 1973. From 1968 until 1994 he was with IBM's Federal Systems Division, and was appointed an IBM Fellow in 1993. In August, 1994 he joined the University of Notre Dame as first holder of the endowed McCourtney Chair in Computer Science and Engineering. He has served as both Department Chair and Associate Dean for Research, College of Engineering. He is an IEEE Fellow, a Distinguished Visiting Scientist at JPL, and a founder of Emu Solutions, Inc. He holds over 40 patents and is author of two books, including the first text on pipelining. His Ph.D. thesis led to the Kogge-Stone adder used in many microprocessors. Other projects included EXECUBE - the world's first multi-core processor and first processor on a DRAM chip, the IBM 3838 Array processor which was for a time the fastest floating point machine marketed by IBM, and the IOP - the world’s second multi-threaded parallel processor which flew on every Space Shuttle. In 2008, he led DARPA’s Exascale technology study group, which resulted in a widely referenced report on technologies for exascale computing, and has had key roles on many other HPC programs. He has received the Daniel Slotnick best paper award (1994), the IEEE Seymour Cray award for high performance computer engineering (2012), the IEEE Charles Babbage award for contributions to the evolution of massively parallel processing architectures (2014), the IEEE Computer Pioneer award (2015), and the Gauss best paper award (2015). His interests are in massively parallel computing paradigms, processing in memory, and the relationship between emerging technology and computer architectures.

Due to a personal emergency, Dr. Wei Tan will not give the previously scheduled talk.

Dr. Wei Tan, IBM T. J. Watson Research Center, NY, USA

Matrix Factorization on GPUs: A Tale of Two Algorithms

Abstract: Matrix factorization (MF) is an approach to derive latent features from observations. It is at the heart of many algorithms, e.g., collaborative filtering, word embedding and link prediction. Alternating least Square (ALS) and stochastic gradient descent (SGD) are two popular methods in solving MF. SGD converges fast, while ALS is easy to parallelize and able to deal with non-sparse ratings. In this talk, I will introduce cuMF, a CUDA-based matrix factorization library that accelerates both ALS and SGD to solve very large-scale MF. cuMF uses a set of techniques to maximize the performance on single and multiple GPUs. These techniques include smart access of sparse data leveraging memory hierarchy, using data parallelism with model parallelism, approximate algorithms and storage. With only a single machine with up to four Nvidia GPU cards, cuMF can be 10 times as fast, and 100 times as cost-efficient, compared with the state-of-art distributed CPU solutions. In this talk I will also share lessons learned in accelerating compute- and memory-intensive kernels on GPUs.

Bio: DR. WEI TAN is a Research Staff Member at IBM T. J. Watson Research Center. His research interest includes big data, distributed systems, NoSQL and services computing. Currently he works on accelerating machine learning algorithms using scale-up (e.g., GPU) and scale-out (e.g., Spark) approaches. His work has been incorporated into IBM patent portfolio and software products such as Spark, BigInsights and Cognos. He received the IEEE Peter Chen Big Data Young Researcher Award (2016), Best Paper Award at ACM/IEEE ccGrid 2015, IBM Outstanding Technical Achievement Award (2017, 2016 and 2014), Best Student Paper Award at IEEE ICWS 2014, Best Paper Award at IEEE SCC 2011, Pacesetter Award from Argonne National Laboratory (2010), and caBIG Teamwork Award from the National Institute of Health (2008). For more information, please visit and

Call for Papers

Scaling up machine-learning (ML), data mining (DM) and reasoning algorithms from Artificial Intelligence (AI) for massive datasets is a major technical challenge in the time of "Big Data". The past ten years have seen the rise of multi-core and GPU based computing. In parallel and distributed computing, several frameworks such as OpenMP, OpenCL, and Spark continue to facilitate scaling up ML/DM/AI algorithms using higher levels of abstraction. We invite novel works that advance the trio-fields of ML/DM/AI through development of scalable algorithms or computing frameworks. Ideal submissions would be characterized as scaling up X on Y, where potential choices for X and Y are provided below.

Scaling up

  • recommender systems
  • gradient descent algorithms
  • deep learning
  • sampling/sketching techniques
  • clustering (agglomerative techniques, graph clustering, clustering heterogeneous data)
  • classification (SVM and other classifiers)
  • SVD
  • probabilistic inference (Bayesian networks)
  • logical reasoning
  • graph algorithms and graph mining


  • Multi-core architectures/frameworks (OpenMP)
  • Many-core (GPU) architectures/frameworks (OpenCL, OpenACC, CUDA, Intel TBB)
  • Distributed systems/frameworks (GraphLab, MPI, Hadoop, Spark, Storm, Mahout, etc.)

Proceedings of the Parlearning workshop will be distributed at the conference and will be submitted for inclusion in the IEEE Xplore Digital Library after the conference.

PDF Flyer

Journal publication

Selected papers from the workshop will be published in a Special Issue of Future Generation Computer Systems, Elsevier's International Journal of eScience. Special Issue papers will undergo additional review.


Best Paper Award: The program committee will nominate a paper for the Best Paper award. In past years, the Best Paper award included a cash prize. Stay tuned for this year!

Travel awards: Students with accepted papers have a chance to apply for a travel award. Please find details on the IEEE IPDPS web page.

Important Dates

  • Paper submission: January 13 January 20, 2017 AoE
  • Notification: February 10, 2017
  • Camera Ready: March 10 March 22

Paper Guidelines

Submitted manuscripts should be upto 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. Format requirements are posted on the IEEE IPDPS web page.

All submissions must be uploaded electronically at


  • General chair: Anand Panangadan (California State University, Fullerton, USA)
  • Technical Program co-chairs: Henri Bal (Vrije Universiteit, The Netherlands) and Arindam Pal (TCS Innovation Labs, India)
  • Publicity chair: Charalampos Chelmis (University at Albany, State University of New York, USA)
  • Steering Committee chair: Yinglong Xia (Huawei Research, USA)

Technical Program Committee

  • Snehasis Banerjee, TCS Research, India
  • Brojeshwar Bhowmick, TCS Research, India
  • Danny Bickson, GraphLab Inc., USA
  • Vito Giovanni Castellana, Pacific Northwest National Laboratory, USA
  • Tanushyam Chattopadhyay, TCS Research, India
  • Daniel Gerardo Chavarria, Pacific Northwest National Laboratory, USA
  • Sutanay Choudhury, Pacific Northwest National Laboratory, USA
  • Valeriu Codreanu, SURFsara, The Netherlands
  • Lipika Dey, TCS Research, India
  • Zhihui Du, Tsinghua University, China
  • Anand Eldawy, University of Minnesota, USA
  • Dinesh Garg, IBM Research, India
  • Saptarshi Ghosh, IIEST Shibpur, India
  • Dianwei Han, Northwestern University, USA
  • Renato Porfirio Ishii, Federal University of Mato Grosso do Sul (UFMS), Brazil
  • Ananth Kalyanaraman, Washington State University, USA
  • Gwo Giun (Chris) Lee, National Cheng Kung University, Taiwan
  • Carson Leung, University of Manitoba, Canada
  • Animesh Mukherjee, IIT Kharagpur, India
  • Debnath Mukherjee, TCS Research, India
  • Francesco Parisi, University of Calabria, Italy
  • Lijun Qian, Prairie View A&M University
  • Himadri Sekhar Paul, TCS Research, India
  • Aske Plaat, Leiden University, The Netherlands
  • Chandan Reddy, Wayne State University, USA
  • Rekha Singhal, TCS Research, India
  • Weiqin Tong, Shanghai University, China
  • Cedric van Nugteren, TomTom International BV
  • Zhuang Wang, Facebook, USA
  • Qingsong Wen, Georgia Institute of Technology, USA
  • Lingfei Wu, IBM T. J. Watson Research Center, USA
  • Bo Zhang, IBM, USA
  • Jianting Zhang, City College of New York, USA

Past workshops